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PREFACE 


The present work had its beginnings in a series of papers 
published jointly some years ago by Dr Dorothy Wrinch and 
myself. Both before and since that time several books pur- 
porting to give analyses of the principles of scientific inquiry 
have appeared, but it seems to me that none of them gives 
adequate attention to the chief guiding principle of both 
scientific and everyday knowledge : that it is possible to learn 
from experience and to make inferences from it beyond the 
data directly known by sensation. Discussions from the 
philosophical and logical point of view have tended to the con- 
clusion that this principle cannot be justified by logic alone, 
which is true, and have left it at that. In discussions by physi- 
cists, on the other hand, it hardly seems to be noticed that 
such a principle exists. In the present work the principle is 
frankly adopted as a primitive postulate and its consequences 
are developed. It is found to lead to an explanation and a 
justification of the high probabilities attached in practice to 
simple quantitative laws, and thereby to a recasting of the 
processes involved in description. As illustrations of the 
actual relations of scientific laws to experience it is shown how 
the sciences of mensuration and dynamics may be developed. 
I have been stimulated to an interest in the subject myself on 
account of the fact that in my work in the subjects of cosmo- 
gony and geophysics it has habitually been necessary to apply 
physical laws far beyond their original range of verification in 
both time and distance, and the problems involved in such 
extrapolation have therefore always been prominent. 

My thanks are due to the staff of the Cambridge Univer- 
sity Press for their care and courtesy ; also to Dr Wrinch and 
Mr M. H. A. Newman, who have read the whole in proof 
and suggested many improvements. 

HAROLD JEFFREYS 

ST JOHN’S COLLEGE 
CAMBRIDGE 
January 1931 




CHAPTER I 


LOGIC AND SCIENTIFIC INFERENCE 


“Contrariwise**, continued Tweedledee, “if it was so, it might be; 
and if it were so, it would be: but as it isn*t, it ain’t. That’s logic.’* 

Lewis Carroll, 
Through the Looking Glass 


1-1. The fundamental problem of this work is the question 
of the nature of scientific inference. The data available to the 
scientific worker, as well as to the man in the street, are com- 
posed of two classes. The first class consists of the crude data 
provided by the senses. These will be called sensations. The 
second class consists of general principles, which determine 
how the information provided by the senses is to be treated. 
It is actually treated in two difterent ways, which may be 
called description and inference. Description, in the strict 
sense, would involve only the cataloguing and classification of 
sensations already experienced. Inference is the use of sen- 
sations already experienced to derive information about sen- 
sations not yet experienced, to construct physical objects, and 
to describe the past and future of these physical objects. For 
pure description only an application of the principles of 
classification and the properties of classes is required; these 
are purely logical ideas. 

Inference requires much more. However fully one's past 
experience has been described and indexed, nothing not in- 
cluded in it can be inferred without some principles not 
purely logical in character. As a matter of logic this is a 
commonplace. Actually one proceeds, in the simplest type of 
inference, on the supposition that what has been found to 
be true in previous instances will be repeated in new in- 
stances. The distinction between deductive logic and scientific 
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2 LOGIC AND SCIENTIFIC INFERENCE 

inference may be illustrated by means of one of the classical 
instances of the former. 

All men are mortal. 

Socrates is a man. 

Therefore Socrates is mortal. 

This type of argument, the syllogism, is one of those chiefly 
used in pure logic ; indeed it was believed for ages that there 
was no other. The first, or general, statement about all men 
is called the major premiss, the particular statement that 
Socrates is a man is called the minor premiss, and from the two 
together we draw the conclusion that Socrates is mortal. But 
as a scientific argument it is unsatisfactory. We question im- 
mediately whether the major premiss is true. It is not known 
by experience. We cannot state as a result of experience that 
a man is mortal until he is dead. At any instant men are 
living, and they all constitute unverified instances of the 
premiss; it is simply unknown by experience whether the 
general statement is true or not. Gulliver, arriving in 
Luggnagg, might have said equally well : 

All men are mortal, 

A Struldbrug is a man. 

Therefore a Struldbrug is mortal. 

But Gulliver knew better and did not argue with his informers. 

There are several ways of treating the classical syllogism so 
as to make it somewhat more acceptable to scientific thought. 
One is to say that the general proposition is not asserted from 
experience at all, but is known to be true in all possible cases 
from previous knowledge. In such a case the syllogism be- 
comes valid. But we avoid the difficulty only by admitting 
that there may be knowledge applicable to the study of ex- 
perience and not itself derived from experience. This type of 
knowledge we call a priori. We do not say that it is the solution 
of the present difficulty, but a priori knowledge exists, and 
we shall have occasion later to consider instances of it at 
length. 
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The word mortal itself introduces difficulties of a type that 
will concern us later. Suppose that we accepted the syllogism 
and that Socrates had nevertheless survived till the present 
day. We should still not be compelled to reject the conclusion 
of the syllogism. If a doubter pointed out that Socrates had 
reached the advanced age of 2000 odd, that would not in the 
least prevent us from continuing to assert his mortality. Our 
reply would be that he might die to-morrow as far as the 
doubter knew; and that would close the matter unless the 
doubter thought of a new line of attack. But suppose he went 
on : ‘‘You are saying that Socrates will not live for ever. May 
I point out that even if he lives to be a million years old it 
could still be said that he would die some day? Your state- 
ment has the quality that no evidence could possibly be pro- 
duced that would contradict it. Even if it is true it still gives 
no reason to suppose that a man cannot live till he is a million 
years old. In fact it is vague and unverifiable, and therefore 
uninteresting’’. The doubter has at this stage abandoned the 
attempt to show that the deduction has been falsified by ex- 
perience; he says instead that it is futile because it is not 
capable of being compared with experience. This is the 
scientific attitude. 

Both these criticisms of the classical syllogism have ana- 
logues in relation to certain modern theories of scientific 
knowledge, as we shall see later. 

1 * 2 . An essential object of scientific inference is to increase 
knowledge. The syllogism has a place in it just so far as it 
assists this object. Some syllogisms do; others do not. 
Consider the following example : 

All English policemen are over five feet nine inches 
in height. 

Brown is an English policeman. 

Therefore Brown is over five feet nine inches in height. 
The syllogism as it stands is perfect. The conclusion is free 
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from the difficulties of that of the classical syllogism; it is 
perfectly possible to measure Brown’s height. But how do 
we know the general proposition? If it is known by experi- 
ence, we have already measured the heights of all English 
policemen, and therefore we have measured P.C. Brown in 
particular, and we know directly that his height is over five 
feet nine inches. The major premiss, in fact, contains the 
conclusion, and the syllogism tells us nothing that we did 
not know already. But suppose that we have not made ex- 
tensive measurements of the heights of policemen, but that 
we know of the official regulation that no man is appointed to 
be a policeman unless his height is at least five feet nine. The 
general proposition is now part of our knowledge without 
having been verified in all its instances ; it is previous know- 
ledge. The inference concerning P.C. Brown is now new 
knowledge ; the syllogism tells us something to expect about 
his appearance when we meet him that we should not have 
known without it. Thus the same syllogism may or may not 
provide new knowledge, according to the means of knowing 
its premisses. 

The syllogism about Socrates raises the same question in a 
more complicated form. Its author may have had previous 
intuitive or divinely revealed knowledge, independent of ex- 
perience, that all men were mortal. If so, he could construct 
his syllogism and derive new knowledge about the particular 
man Socrates. But this is not the practical case; belief in 
human mortality is based on experience. A contemporary of 
Socrates might proceed in the following way. He would sum- 
marize what he knew of the duration of human life. No case 
was known of a man’s having lived for 200 years, and few for 
100. This suggests a general rule: all men die before reaching 
200 years of age, most before reaching 100. He might look for 
exceptions among living persons, of whom few were over 100 
and none over 200. The general rule was verified with regard 
to all dead persons, and not contradicted by living ones. It is 
then stated as a result of experience. The inference concerning 
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the life of a living person could then be drawn from the rule. 
It could be said that “ Socrates will not live to be 200 ; he will 
probably not live to be 100*’. I'he fundamental difference 
between the two methods of approach is that in the former, 
where the major premiss is known a priori^ we always proceed 
from the general to the particular; in the latter we get the 
major premiss itself by asserting as a general proposition what 
was previously known only in particular instances. The former 
method is deduction, the latter induction. In both cases we 
proceed from the premisses to the conclusion by means of 
an apparent syllogism; but there is a significant difference, 
due to the difference in the nature of the available knowledge 
about the major premiss. Suppose that two people, while 
Socrates was alive, both drew the inference that he would not 
live to be 200, one basing his beliefs on human life in general 
on intuitive Imowledge, the other on previous experience; 
and suppose that Socrates nevertheless lived to be 2000. 
Suppose further that our doubter paid a visit to Elysium and 
interviewed their shades. The former would have to admit 
that his intuitive knowledge, which he had held with certainty, 
was wrong, or to say that Socrates was not a man but an im- 
mortal god, or perhaps to resort to abuse. The latter would 
explain that the major premiss in his inference was not known 
with certainty, but that it was extremely probable on the 
evidence before him. The inference had been correct for some 
thousands of millions of people that lived when or after it was 
drawn, and in the circumstances it was not so bad that there 
had been one exception to the general rule. If he chose to be 
aggressive he might ask whether Socrates had been medically 
examined recently with a view to finding out the causes of his 
anomalous behaviour; for one of the chief functions of 
exceptions is to improve the rule. 

The inference with regard to Socrates has actually been 
verified, but the situation has arisen with respect to many 
other scientific laws. At present we are faced with the in- 
accuracy of Euclid’s parallel axiom, which for millennia was 



6 


LOGIC AND SCIENTIFIC INFERENCE 


considered intuitively obvious ; with the inaccuracy of New- 
ton’s law of gravitation, which had been well established by 
experience and had been believed for centuries to be exact; 
with the failure in stars of the law of the indestructibility of 
matter ; and with the discordance of the classical undulatory 
theory of light with the group of facts known as quantum 
phenomena. For twenty years physical science has been 
modifying and reconstructing its most fundamental laws as a 
result of new knowledge. The reconstruction has followed, 
and will continue to follow, the old method^ but the results 
will be different because new facts have to be fitted in. Will 
modern physics suffer in turn the fate of the old? Perhaps; 
nobody knows. But in the circumstances we must raise a 
group of questions more fundamental and general than any 
physical law. Have recent developments shown that scientific 
method itself is open to suspicion, and if so, is there a better 
one? Just how much do we mean when we assert the truth 
of a scientific generalization? When we have made such a 
generalization, what reason have we for supposing that further 
instances of it will be true? 

1-3. The answers to these questions may be stated at once. 
There is no more ground now than thirty years ago for doubt- 
ing the general validity of scientific method, and there is no 
adequate substitute for it. When we make a scientific genera- 
lization we do not assert the generalization or its consequences 
with certainty ; we assert that they have a high degree of pro- 
bability on the knowledge available to us at the time, but that 
this probability may be modified by additional knowledge. 
Our answer is that returned to the doubter by the second 
shade. The more facts are shown to be co-ordinated by a law, 
the higher the probability of that law and of further inferences 
from it. But we can never be entirely sure that additional 
knowledge will not some day show that the law is in need of 
modification. The law is provisional, not final; but scientific 
method provides its own means of assimilating new know- 
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ledge and improving its results. The notion of probability, 
which plays no part in logic, is fundamental in scientific in- 
ference. But the mere notion does not take us far. We must 
consider what general rules it satisfies, what probabilities are 
attached to propositions in particular cases, and how the 
theory of probability can be developed so as to derive esti- 
mates of the probabilities of propositions inferred from others 
and not directly known by experience. 

At the same time a remarkable thing happens. It is found 
that general propositions with high probabilities must have 
the property of mathematical or logical simplicity. This leads 
to a reaction upon the descriptive part of science itself. The 
number of possible methods of classifying sensations is 
colossal, perhaps infinite. But the importance of simple laws 
in inference leads us to concentrate on those properties of 
sensations that actually satisfy simple laws as far as they have 
been tested. Thus the classifications of sensations actually 
adopted in practical description are determined by considera- 
tions derived from the theory of inference ; and probability, 
from being a despised and generally avoided subject, becomes 
the most fundamental and general guiding principle of the 
whole of science. 



CHAPTER II 


PROBABILITY 

Oh, it ain’t gonna rain no mo’, no mo’, 

It ain’t gonna rain no mo’ ! 

How in the hell can the old folks tell? 

Tain’t gonna rain no mo’ ! 

Messrs Layton and Johnstone 


2*1. What is probability? 

Suppose that a man wishes to catch a train announced 
to start at i.o p.m. When he is a quarter of a mile from the 
station he looks back and sees that a church clock some 
distance away indicates 12.55. Will he catch the train? 

From previous experience he knows that a quarter of a mile 
in five minutes means comfortable walking without wasting 
time. The distance, with slight exertion, can be done in four 
minutes. Hence he may reasonably expect to catch the train, 
especially if he hurries slightly. But he has to get a ticket 
before he will be admitted to the platform. If he finds nobody 
waiting at the booking office this is a matter of ten seconds ; 
but if there is a queue of ten people it will take two minutes, 
and he has no means of knowing which will occur in this case. 
Again, though the church clock is usually reliable, it has been 
known on a few occasions to be as much as three minutes 
slow. If that is so on this occasion, and the train is punctual, 
his chance of catching the train disappears. On the other 
hand, if the train is a few minutes late, as sometimes happens, 
he will catch it even if there is a queue and the clock is slow. 
Further, there is always the possibility of something quite 
unforeseen, such as an accident on the line. In that event 
the 1 1. 14 train may arrive at 1.30 and his problem will be 
solved. 

Now we notice that in this situation the man has some 
definite information, which is relevant to the proposition ‘‘he 
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will catch the train’’. But numerous other possibilities, none 
of which he can foresee, are also intensely relevant. Therefore 
his available knowledge, though relevant to the proposition 
at issue, is not such as to make it possible to assert definitely 
that this proposition is true or false. Further, extra data will 
have a definite effect on his attitude to the proposition. If he 
meets an astronomer whose watch has just been compared 
with a wireless time signal, and who assures him that the 
church clock is accurate, he feels more confident. On the 
other hand, if a crowded omnibus passes him he expects his 
worst fears about the queue to be verified. Thus the attitude 
to the proposition under discussion does not amount to a 
definite assertion of its truth or falsehood ; it is an impression 
capable of being modified at any time by the acquisition of 
new knowledge. 

Probability expresses a relation between a proposition and 
a set of data. When the data imply that the proposition is true, 
the probability is said to amount to certainty; when they 
imply that it is false, the probability becomes impossibility. 
All intermediate degrees of probability can arise. 

The relation of the laws of science to the data of observa- 
tion is one of probability. The more facts are in agreement 
with the inferences from a law, the higher the probability of 
the law becomes ; but a single fact not in agreement may re- 
duce a law, previously practically certain, to the status of an 
impossible one. A specimen of a practically certain law is 
Ohm’s law for solid conductors, Newton’s inverse square 
law of gravitation first became probable when it was shown 
to give the correct ratio of gravity at the earth’s surface to the 
acceleration of the moon in its orbit. Its probability increased 
as it was shown to fit the motions of the planets, satellites, 
and comets, and those of double stars, with an astonishing 
degree of accuracy. Leverrier’s discovery of the excess motion 
of the perihelion of Mercury scarcely changed this situation, 
for the phenomenon was qualitatively explicable by the attrac- 
tion of the visible matter within Mercury’s orbit. Newton’s 
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law was first shown to be wrong, as a universal proposition, 
when it was found that such matter could not actually be 
present in sufficient quantity to account for the anomalous 
motion of Mercury. 

The fundamental notion of probability is intelligible a 
priori to everybody, and is regularly used in everyday life. 
Whenever a man says “I think so’’ or “I think not” or “I 
am nearly sure of that” he is speaking in terms of this con- 
cept; but an addition has crept in. If three persons are pre- 
sented with the same set of facts, one may assert that he is 
nearly certain of a result, another that he believes it probable, 
while the third will express no opinion at all. This might 
suggest that probability is a matter of differences between 
individuals. But an analogous situation arises with regard to 
purely logical inference. One person, reading the proof of 
Euclid’s fifth proposition, is completely convinced; another 
is entirely unable to grasp it; while there is at any rate one 
case on record when a student said that the author had ren- 
dered the result highly probable. Nobody says on this 
account that logical demonstration is a matter for personal 
opinion. We say that the proposition is either proved or not 
proved, and that such differences of opinion are the result of 
not understanding the proof, either through inherent in- 
capacity or through not having taken the necessary trouble. 
The logical demonstration is right or wrong as a matter of the 
logic itself, and is not a matter for personal judgment. We say 
the same about probability. On a given set of data p we say 
that a proposition q has in relation to these data one and only 
one probability. If any person assigns a different probability, 
he is simply wrong, and for the same reasons as we assign in 
the case of logical judgments. Personal differences in assign- 
ing probabilities in everyday life are not due to any ambiguity 
in the notion of probability itself, but to mental differences 
between individuals, to differences in the data available to 
them, and to differences in the amount of care taken to 
evaluate the probability. 
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2*2. The mathematical discussion of probability depends on 
the principle that probabilities can be expressed by means of 
numbers. This depends in turn on two deeper postulates : 

1 . If we have two sets of data p and p\ and two propositions 
q and q\ and we consider the probabilities of q given />, and of (f 
givenp\ then whatever p^p\ q^ q* may bey the probability ofq given 
p is either greater thany equal tOy or less than that of q' given p\ 

2. All propositions impossible on the data have the same 
probability y which is not greater than any other probability; 
and all propositions certain on the data have the same proba- 
bility y which is not less than any other probability. 

The relations greater than and less than are transitive ; that 
is, if one probability is greater than a second, and the second 
greater than a third, then the first probability is greater than 
the third. If one probability is greater than a second, the 
second is said to be less than the first ; and if neither of two 
probabilities is greater than the other we say that they are 
equal. This postulate ensures the existence of a definite order 
among probabilities, such that each probability follows all 
smaller ones and precedes all greater ones. 

Such an order once established, we can construct a corre- 
spondence between probabilities and real numbers, so that 
to every probability corresponds one and only one number, 
and so that of every pair of probabilities the less corresponds 
to the smaller number. When this is done the system of 
numbers can be used as a scale of reference for probabilities. 
But the choice is not yet unique. Obviously if 
are a set of positive numbers in increasing order of 
magnitude, x^^y x^y ... x^^ are another set, e^^y e^\ ... e^n a 

third, — ^ , ... — — — a fourth, and any number of 

such sets can be found, such that if probabilities correspond 
term by term with the numbers of one set in order of magni- 
tude they will correspond equally well with those of any 
other set. We need a further rule before we can decide what 
number to attach to any given probability. Such a rule is a 
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mere method of working, or convention ; it expresses no new 
assumption. We decide that 

3. If several propositions are mutually contradictory on the 
data^ the number attached to the probability that some one of 
them is true shall be the sum of those attached to the probabilities 
that each separately is true. 

If we do this it follows at once that o is the number to be 
attached to a proposition impossible on the data. For con- 
sider any three mutually exclusive propositions />, r, and 
suppose we have the further datum that/) is true. The number 
attached to a proposition impossible on the data being a^ it 
follows that the numbers attached to q and r separately on 
the data are both a. Hence, by our rule, since q and r are 
mutually exclusive, the number attached to the proposition 
that one of them is true is 2a. But the proposition or r is 
true” is itself impossible on the data and therefore has the 
number a attached to it. Hence 2a = «, and therefore a = 0. 

Again, let us consider any set of m equally probable and 
mutually contradictory propositions, and call the number 
attached to any one of them, on the same data, x. If we 
select any I of them, the number attached to the proposition 
that one of these / is true is lx, by our rule. 

Now take 1 = m, and suppose that on our data there is just 
one true proposition among the m, but that we have no means 
of knowing which it is. The number attached to the proposi- 
tion that one of the m propositions is true is mx. But on our 
data this proposition is certain, and therefore mx is the 
number corresponding to certainty, which is a definite constant 
by Prop. 2. We therefore choose i as the constant to be 
attached to certainty. This is another convention. Thus 
mx= ly and we derive the rule : 

If m propositions are equally probable on the data and 
mutually contradictory, and one of them is known to be true, 
each has the number i jm associated with it. Further, the pro- 
position that one out of any I of them is true has the number Ijm 
associated with it. 
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The conditions for the application of this method are prac- 
tically realizable. Suppose that m balls, one of them with a 
characteristic mark on it, but indistinguishable by touch, 
were placed in a bag and shaken. / balls are then withdrawn. 
Then the proposition that any particular ball is the marked 
one is inconsistent with the proposition that any other is 
marked, and all such propositions are equally probable. We 
have therefore a set of equally probable and mutually ex- 
clusive propositions, m in number. Our rule therefore has 
a practical application. Also m may be any integer, and / may 
be any integer less than m or equal to it. Hence 

5. Any rational proper fraction^ including o and i, can be a 
probability number. 

We shall call the class of probabilities expressible by 
rational fractions i?-probabilities. 

It follows from this that any probability can be made to 
correspond to a real number, rational or irrational. For any 
given probability P either corresponds to a rational fraction 
or does not. In the former case the proposition is granted. 
In the latter case every P-probability is either greater or less 
than P. Hence P divides the P-probabilities into two classes 
Pi and Pg , such that the probabilities in Pj are all less than 
P and those in Pg are all greater than P. Also, since the 
relation ‘‘greater than’’ among probabilities is transitive, 
every fraction corresponding to an P2 probability is greater 
than every fraction corresponding to an Pi probability. Hence 
P determines a cut in the series of rational fractions. But this 
is precisely the method of defining a real irrational number ; 
when it is specified which rational fractions are on one side 
of the cut and which on the other side, there is one and only 
one real number that can occupy the cut. We then associate 
the probability P with this number. In this way we arrive 
at the result : 

6. Every probability can be associated with a real number ^ 
rational or irrational. 

We still have to prove that the results given by our rules 
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are consistent ; that is, if a probability P is greater than another 
probability Qy that the number associated with P by our rules 
is greater than that associated with Q. Suppose first that 
P and Q are both i?-probabilities. Then we can find four 
integers /, niy r, s so that the number associated with P is //m 
and that associated with Q is rjs. Now consider a class of ms 
mutually exclusive propositions containing one true one. We 
may divide them up into m sets of s each ; one and only one of 
these sets contains the true proposition. The probability- 
number that one of / of these sets contains the true proposi- 
tion is Ijm, But this is also the probability-number that one 
of Is propositions selected from the original ms propositions 
shall be the true one, which by our rule is Isjms and equal to 
//m, as it should be. Thus Ijm is the number associated with 
the proposition that one out of the Is alternatives is true; 
similarly rjs is associated with the proposition that one out 
of rm alternatives is true. If then P is greater than Qy the 
number of alternatives needed to give probability P must 
exceed that needed to give probability Q\ therefore Is is 
greater than rm. But this is equivalent to saying that Ijm is 
greater than rjs'y and therefore the greater probability is 
associated with the greater number. 

Consistency is therefore proved for /?-probabilities. For 
others the result is easily generalized. For if two non-rational 
probabilities are associated with real numbers a and ft, of 
which a is the greater, we can find a rational fraction Ijm 
lying between them. Then the probability associated with a 
is greater than that associated with //m, and that associated 
with Ijm is greater than that associated with ft. Hence, in 
virtue of the transitive property of the relation more probable 
thariy the probability associated with a is greater than that 
associated with ft. In other words, the greater number corre- 
sponds to the greater probability. 

We have seen how definite numbers can be associated with 
probabilities, so that the higher number always corresponds 
to the higher probability. In consequence of our fundamental 
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assumption our rules always imply the existence of a definite 
probability-number. The rules, as we stated before, are con- 
ventions and not hypotheses; for if the probability-number 
assigned by our rules is Xy any function of x that always 
increases with x would satisfy the fundamental assumption. 
But the choice that we have made seems to be far the most 
convenient. Henceforth we shall have no need to speak of 
probabilities apart from their associated numbers, and when 
we speak of the probability of a proposition on given data 
we shall mean the number associated with the probability by 
our rules. 

2‘3. We now introduce the notation P{p\q) for the proba- 
bility of the proposition p on the data 5 *. It may be read 
‘‘the probability of p given q^\ We also adopt the following 
notations from mathematical logic. 

<-^p means the contradictory of />, that is, the statement 
that^ is untrue. It is read “not/)^\ 
py q means the disjunction of p and q^ that is, the pro- 
position that at least one of p and q is true. It applies whether p 
and q are consistent with each other or not. It is read or 5 ^ 

* W. E. Johnson and J, M. Keynes use the notation pjq for the proba- 
bility of p given q. The disadvantage of this notation is that the oblique 
stroke is a recognized device for printing fractions. As actual fractions 
will often occur explicitly in this work it seems desirable to avoid the 
confusion in readirtg that would arise from a similarity in notation. 

In the earlier papers by Wrinch and Jeffreys f the notation P(p\q) 
was used. The use of P calls attention directly to the fact that the number 
is a probability-number, and therefore to the fact that the elements within 
the bracket are propositions, and avoids complexity when the product of 
several probabilities has to be written. But the colon has the drawback 
that in the notation of mathematical logic it is often wanted for a bracket. 
The vertical stroke also has a meaning in mathematical logic, but there 
is no likelihood of confusion. 

t “On Some Aspects of the Theory of Probability’*, Phil. Mag. 38 , 
1919, 715-73 1. “On Certain Fundamental Principles of Scientific In- 
quiry”, Phil. Mag. 42 , 1921, 369-390; (second paper), Phil. Mag. 46 , 
^ 9 ^ 3 ) 368-374. “The Theory of Mensuration”, Phil. Mag. 46 , 1923, 
1-22. “The Relation between Geometry and Einstein’s Theory of Gravi- 
tation”, Nature^ 106 , 1921, 806-809, 
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p . q means the proposition that/) and q are both true. It is 
called the joint assertion of p and and is read “/> and q^\ 
These notations may be combined. Thus '^{p •q) means 
the proposition that p and q are not both true, and therefore 
is equivalent to ^py ^ q. 

Evidently 

P(/)|/>) = i; P(~/>|/)) = o. (i) 

2-31 . Now suppose we have a set of data A. Then the following 
four propositions are mutually exclusive: p,q^ '^P-q* 

^p . q. By our original rule the probability that one of 
p . q and p • ^ q is true is the sum of their probabilities 
separately. But one oi p .q and p , ^ q true if, and only 
if, p is true. 

Hence 

P{p\h) = P{p.q\h) + P{p.^q\h). (i) 

Similarly 

P{q\h) = P{p.q\h) + P{^p.q\h). (2) 

By addition 

P{p\h) + P{q\h) = 2 P{p.q\h)+P{p.~q\h) 

+ P{~p.q\h). (3) 

But the disjunction pM qi% true if and only if one oi p . q, 
'^p -q, and p . ~ q is true. Hence 

P{pwq\h) = P{p.q\h) + P{p.^q\h) 

+ P{--p-q\h), (4) 

and therefore, comparing the last two equations, we have 

P (p \ h) + P {q \ h) = P (p'^ q \ h) + P (p . q \ h). (5) 

2-32. Consider next a class of n propositions, of which we 
know that one and only one is true, and any one is as probable 
as any other. Then if any m of them are selected the proba- 
bility that one of these m is true is tnjn. Let q denote the 
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proposition that one of these m is true, and h the data we had 
initially. Then 

P (9 I A) = min. (1) 

Consider another class of the original propositions and let p 
denote the proposition that some member of this class is true. 
Then the proposition that p and q are both true is the pro- 
position that some proposition in the common part of the two 
classes is true. Let the number of propositions in the common 
part be L Then 

P{p.q\h)=^lln 

= (//m) (min). (2) 

Now consider P(p \ q . h), the probability thatp is true given 
h and q, h and q are both true if the true proposition is in- 
cluded in the class of number m, p is true, given q and A, if 
the true proposition is one of the common part, of number /, 
given that it is one of the class of number m. Hence 

Pfj>\q.h) = llm, (3) 

and finally 

P{p.q\h) = P{p\q.h)P{q\h). (4) 

This proposition is of capital importance. We have proved it 
for cases where p and q are expressible as disjunctions of 
equally probable and mutually exclusive alternatives. It can- 
not be proved in general without some further assumption. 
If P{p nq\h) was a function of P{p\q ,h) and P {q | A), 
different from their product, then we could choose /, rtiy and 
n so as to make the theorem untrue in some of the cases where 
we have proved it true ; but we cannot absolutely exclude the 
possibility of another variable entering into the equation and 
producing exceptions to the rule (4) when the probabilities 
are not P-probabilities. It does not seem worth while, 
however, to consider such a possibility at present. It 
will be assumed without further discussion that (4) holds 
in general. 


jsi 


2 
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2-33. We have also by symmetry 

P{p.q\h) = P{q\p.h)P{p\h), ( 5 ) 

and therefore 

P{p\q.h) = ^ ^ 

This theorem yields as an immediate consequence the prin- 
ciple of inverse probability. Suppose that g' is a logical con- 
sequence of p and A, so that P (j | /> . A) = i, and suppose 
further that q has been verified. Then P {p\h) is the prior 
probability of p before the verification and P {p\q .h) the 
posterior probability after the verification. Then our result is 
that the posterior probability of p is the prior probability of p 
divided by the prior probability of the consequence. The 
more remarkable the consequence, then, the greater the in- 
crease produced by its verification on the probability of the 
hypothesis under test. 

2*34. Again, suppose that are a number of mutually 
exclusive hypotheses such that one of them must be true. Then 
for each we have a relation of the form (6), and therefore 

_ _P ( AJ_? • h) _J* i p2 1 g : h) 

P{q\p, . h) P{p, I K) P\q \p^.h) P(P, I h) 

_ _ PiPn\q-h) 

---p{q\p„:hjp{pjhy 

But 2 P(/>r I ? • A) = V/>2 ... V/>„ I y ./(), (2) 

r-1 

since the p's are mutually exclusive 

= I. (3) 

since it is known that one of the p’s is true. Hence each of 
the fractions in (i) is equal to 

I 

^Piq\pr-h)P{pr\h) 

r=l 
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Therefore 


P{pr\q.h) 


P{q\pr.h)P {pr I h) 


2 P{q\pr-h)P{pr\h) 
r=»l 


( 4 ) 


This theorem’’^ is to the theory of probability what Pythagoras’s 
theorem is to geometry. 


2*341. It follows at once that P {p^ \ q , h) can hardly ever 
be unity ; for in the fraction on the right the denominator is 
the sum of the numerator and a number of other positive 
terms. But if q has a small probability on all the hypotheses 
except one, say, and a large probability on that one, and 
the prior probabilities of the hypotheses are comparable, then 
the posterior probability of may approach unity. This is 
the type of inference known as a crucial test. 


2*342. Again, suppose thzt p^ implies so that 
P{q \Pi.h) = o, 

and that nevertheless q is verified. Then ( 4 ) shows that 
P{pi |?-A)==o. 

This explains how the failure of a crucial test may reduce a 
previously plausible hypothesis to impossibility. 


2*343. It may happen that the probability of q is the same 
on all the hypotheses under discussion; that is, that 
P{q\Pr-h) 

is the same for all values of r. Then 


p(p,ig. 

S P{Pr\h) 


(I) 


r=l 

But the prS are known to be mutually exclusive, and one 
them is true. Hence 


of 


^^P{pr\h) = I, 

P{Pr \q-K) = P(/»r I h). 


* Bayes, Phil. Tram. 53 , 1763, 376-398. 


and 


( 2 ) 
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Thus for each hypothesis the posterior probability is equal 
to the prior probability, and the test does nothing to help us 
to decide between the hypotheses. This is the case of ir- 
relevance. 


2*344. On the other hand suppose that ? is a logical conse- 
quence of h alone. Then P {q\h) and P{q\p .h) are both 
unity, and 


P{p\q-h) = P{p\h). 


(0 


If for instance h consists of the primitive propositions of logic 
and mathematics, and q is any demonstrated proposition of 
pure mathematics, then q can be included in the data without 
affecting any probability. 

It may be mentioned that the case where q is implied by p 
and contradicted by h cannot arise \iov P{q\p .h) depends 
on the possibility of p and h being data at the same time, and 
this cannot happen if one implies a consequence contradicted 
by the other, for then they would be inconsistent. 


2*4. In all estimates of posterior probability by means of 
the theorem of 2*34, the prior probabilities of the hypotheses 
appear explicitly. The theorem does not therefore give definite 
answers unless these prior probabilities are known ; and here 
we come upon the greatest stumbling-block in the theory of 
probability. 

How do we assess the probability of a proposition before 
we have any means of knowing whether it is true or false.? 
It has often been said that assessing a probability implies 
some knowledge, and that therefore we cannot assign a proba- 
bility when we are in complete ignorance. This opinion must 
be directly contradicted. Complete ignorance is a state of 
knowledge, just as much as a statement that a vessel is empty 
is a statement of how much there is in it, and the probabilities 
assigned upon it are perfectly definite. If we have no means 
of choosing between alternatives, the probabilities attached 
to those alternatives are equal*. If there are n alternatives 
* This is usually known as the Principle of Sufficient Reason. 
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just one of which must be true, the prior probability of each 
is ijn. 

The issue is fundamental. Either we can learn from ex- 
perience or we cannot. The ability to learn from experience 
demands the concept of probability in relation to varying 
data, and the recognition of the meanings of more probable 
than and less probable than. Using only rules based on these 
concepts, we have shown how probabilities can be assessed. 
We must either accept the results or reject the fundamental 
principle and say that it is impossible to learn from experi- 
ence. Whatever subject we take up, we start from ignorance 
and build up knowledge by means of experience. Everybody 
but a few philosophers recognizes the general validity of the 
process ; and even the philosophers that say that they reject 
it show by their actions that their rejection is purely academic. 
Put the most sceptical philosopher in the situation described 
at the beginning of this chapter, and he will behave just like 
anybody else and probably express the same doubts. 

But we have still not stated the method completely. Imagine 
a new-born baby to have seen only two objects, one blue and 
one yellow. Another object is to be introduced from outside. 
What is the probability that that object will be blue? If the 
alternatives are that it must be either blue or yellow the 
correct probability is But this is the probability on the 
datum that only two colours are possible. If the next object 
introduced proves to be pink this datum is proved wrong, 
and the fact that the probability was correctly assigned for 
it ceases to be of practical interest. This is a situation 
that we must accept in practice; we are often in situations 
where we cannot foresee every possible alternative, and 
allowance for the possibility of unforeseen alternatives must 
be made. 

The issue can be stated in two simple ways. 

I . The new object will be either blue, yellow, or some other 
colour. If we treat these as three equivalent alternatives the 
probability of each is J. 
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2. The new object will either have a colour known already 
or a new one. If we treat these as two equivalent alternatives 
the probability of each is ^ and the probability that the next 
object will be blue is 

Neither suggestion is quite satisfactory. “Some other 
colour” implies a choice among all possible other colours, 
which may be o to infinity in number, and it is not obvious 
that it can be treated as on an equivalent footing with one 
definite colour*. Nor is it obvious that, when the very ex- 
istence of any other colour is problematic, some other colour 
is as likely to turn up as one of those already known. 

The second suggestion is obviously wrong. If it were 
correct to treat the known and the unknown as equivalent 
alternatives, we could never, however many colours had been 
observed, have any additional confidence that the next one 
would have a colour already known. It therefore contradicts 
our fundamental postulate, that it is possible to learn from 
experience. What may be the correct answer will be indicated 
in the next chapter. 

2‘5. But the usual difficulty in assessing a prior probability 
at the beginning of an investigation does not arise from 
ignorance. The customary obstacle is too much knowledge. 
The statement that a probability number exists in every state 
of knowledge is not the same as the statement that we know 
what it is. The point may be illustrated from the purest of 
pure mathematics, the theory of numbers. How many prime 
numbers are there less than a billion? There is a number of 
such numbers; authorities on the subject can even say ap- 
proximately what it isf ; but just exactly how many prime 
numbers there are under a billion is known to nobody. It 
could be found out by trial, given sufficient time ; but nobody 
has yet had time to do it. This is our usual situation in as- 
sessing a probability. The number of relevant facts is great, 

* Keynes has, effectively, made this point. Cf. Treatise on Probability y 
1921, 60. 

t loge 10. 
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and their bearing on the probability of the proposition under 
discussion is difficult to evaluate precisely, though it may be 
easy in general terms. In actual life we simply do not take 
the trouble to evaluate the probability; we have not the time, 
for nobody can remember at once or enumerate all the rele- 
vant data at his disposal. If our traveller at the beginning of 
this chapter stopped to evaluate the probability accurately at 
any stage he would certainly miss his train. 

The actual situation is therefore that the prior proba- 
bilities enter into our formulae, but we do not know their 
values, and they always affect the posterior probabilities. If 
this were not true newspapers would employ expert cal- 
culators of probabilities instead of unreliable turf tipsters. 
But in scientific work, though we can never make the pos- 
terior probability completely determinate, we can make it so 
near zero or unity as to amount to practical certainty or 
impossibility for all ordinary values of the prior probability. 
This is done by repeated verification and crucial tests. We 
do not know the prior probability of a scientific law when we 
begin an investigation of whether it is true; we swamp the 
prior probability by the number and variety of the verifica- 
tions. The scientific man might, if he took enough trouble, 
evaluate the prior probability accurately; but in practice he 
is not interested in the accurate evaluation of a moderate 
probability. He prefers to obtain such additional information 
as will make the posterior probability approach impossibility 
or certainty whatever the prior probability may have been; 
and when that is done he no longer needs to evaluate the 
prior probability. Nevertheless it leaves its traces. The 
practical certainty or impossibility of an inference from 
abundant experience is not the same thing as absolute cer- 
tainty or impossibility, which can come only from direct 
sensation or a priori knowledge. 



CHAPTER III 


SAMPLING 

Little drops of water, 

Little grains of sand 
Make the mighty ocean 
And the glorious land. 

Julia Carney 

3*1 . We are now in a position to discuss one of the most 
important applications of the theory of probability, the theory 
of sampling. The first problem is as follows: There are n 
objects with a defining property a. Of these, r have a further 
property b. We select at random m of the objects. What is 
the probability that I of these will have the property 6? 

We need a definition of what we mean by at random. We 
mean that every possible selection bf m objects from the 
original n is equally probable. The total number of ways of 
selecting m things from a set of n is denoted by . It is 
called the number of combinations of n things taken m at a 
time, and it is shown in works on algebra to be equal to 
n ^ 

tn\ \n — where «! means the product of all the integers 

from I up to n. There are r accessible objects with the pro- 
perty b. We can select / of these in ways. The other w — r 
objects have not the property b\ and if the sample of m 
objects contains I with the property b it must contain m — / 
without it. Hence in our sampling we choose m — I objects 
from a class of « — r, and this can be done in ways. 

But any selection of / objects with the property b is consistent 
with any selection of m — / objects without it, and therefore 
the total number of ways of selecting m things so that / of 
them will have the property b is ’’Cj . • But by 

hypothesis all the possible selections are equally probable 
and mutually exclusive, and one of them is certain to be 
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made. Hence the probability that any particular one will be 
made is i/"C„, and the probability that we shall make some 
one of a set of number ’’Cj is 


^(0 = 


"C™ ‘ 


(I) 


Since the set of number m must contain either 1,2, ... or wt 
things with the property b, the sum of the probabilities of 
the various values of / must be unity. Hence 


m 

n-rr* n/^ 

Z-1 


(2) 


It is easily proved directly by algebra that this is the case. 
We have 


_ r\ (n — r)\ m\{n — rn)\ 

' /! (r — /)! ' (w — /)! (« — r— m + /)! ‘ n\ 

and therefore 

^ (/ + i) r — / m — l 

g{l) — r — 

which is greater or less than i according as 

/ + I is less or greater than ^ 

The last quantity differs from rmjn by 

V + w + I 2rm 
w + 2 n{n + 2y 

which is always less than unity. It follows that changing / 
to / + I will increase ^ (/) if / is less than mrjn^ save for 
a fraction, and will decrease it if / is greater than that value. 
Hence the most probable value of / is the integer nearest to 
mrjn\ that is, the most probable sample is the fair sample y 
such that the number of things with the property b in the 
sample bears the same ratio to the total number of the sample 
as the number of things with the property b in the whole 
class bears to the number of the whole class. 


(4) 

(s) 

( 6 ) 
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For moderate values of Uy triy r, and / the exact solution (i) 
tells us all we need. But if we have a large class to begin with, 
and extract from it a large sample, it can be proved (Lemma 
II) that the sum of the values of g (/) for values of / between 
mrln+pi and mrln+p2 is very nearly |(erf — erf ^2)) 
where 

I = {2r {n — r)m{n — m)ln^}~^p. (7) 

erf ^ vanishes for ^ = o, but rapidly approaches + i for 
moderate positive values of and — i for negative values*. 
If then is a moderate positive number and ^2 ^ moderate 
negative one, the sum of the values of g (/) corresponding to 
intermediate values of ^ will be nearly unity. The corre- 
sponding range in / is such that I varies in it by a moderate 
multiple of 

H = {2r (ri — r)m{n — m)ln^}^. (8) 

Then H may be taken as a measure of the range of values 
of /that are probable. We notice that if r and « — r are both 
moderate fractions of «, so that the original class was not 
overwhelmingly b or not-i, and if the sample is only a small 
fraction of the original set, so that m/n is small, then 

2r {n — r) (n— m)jn^y (9) 

which is at most will be comparable with its maximum 
value. Then H is about The range of probable devia- 

tion from a fair sample is of the order of where tn is 
the number of the sample. In general the probability that 
the number of 6’s in the sample is between Iq ± H, where /q 
is the most probable value, is 0*843 ; the probability that it 

• The following table will illustrate the point. 



erf i 


erf i 

0 

0 

10 

0*84 

0*2 

0*22 

1*4 

0*95 

0*4 

0-42 

1*8 

0989 

0-6 

060 

2*2 

0*998 

0*8 

074 

00 

i-ooo 


For negative values erf ( - J) = - erf f . 
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is between ^ ± zH is 0-995 5 probability that it is 

between Iq ± is indistinguishable from unity. 

As a specimen of the numerical results, consider the case 
where the original class is equally divided, so that r — \n and 
H = Take m = 100. Then = 50, /f = 7, and the 

probability is 0-995 ^ between 36 and 64. If in- 

stead we take a sample of number 10,000, ^ = 5000, i/ = 71, 
and the probability is 0-995 ^ between 4858 and 

5142. We notice the large size of the sample that has to be 
taken to establish a high probability that the sample will 
be fair within i per cent, of its total number. 


3 - 2 . In the above discussion we have supposed the com- 
position of the whole class known, and we have determined 
the probabilities of different compositions of the sample. 
The usual problem of sampling is the inverse one : given the 
composition of the sample, what inferences can we draw 
about the composition of the whole class We make use of 
formula 2-34 (4). Here let denote the proposition that 
there are just r things in the original class with the property 
b ; then P{p^\h) is the prior probability that this value of r 
is correct. Let us denote it by / (r). The verified proposition 
q is here the fact that the known sample consists of / things 
with the property b and tn — I things without it. Then 
P {q\pr •h) is the probability that a sample, m in number, 
drawn from a class known to consist of r things with the 
property and n — r without it, would contain just I things 
with the property. This is the function we called^ (/), namely. 


g{i) = 


fC* n-rf^ 


I 


(I) 


Since / is now to be kept constant while r varies, we shall 
now call this function h (r). Now applying 2-34 (4) we have 


P{pr\q.h) 


f{r)h{r ) 

lf{r)h{r) 


(2) 
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As usual we can make no further progress without some know- 
ledge of the form of the prior probability /(r). If there is no 
previous reason to think one value of r more likely than any 
other, / (r) is the same for all values of r. In that case the 
posterior probability reduces to 

n-r/^ 

= (3) 

S -C, 

r-0 

It is easy to show that this has its greatest value when rjl is 
as near as possible to njm ; that is, the most probable value of 
r is found by supposing that the known sample is a fair one. 

Suppose that we wish to know the probability that the 
next object examined will have the property h. Denote this 
proposition by Then the probability required is 


Pfe'U.A). 


Since one and only one of the propositions is true 


P{q' \q.h)=‘i:P{q' .pr\q.h) 
r 


= ^Pipr\q-h)P(q' \ pr. k). 


(4) 


To evaluate P (q' \pr‘h) we must suppose a definite value 
of r chosen. / things with the property b have already been 
removed, and therefore r — / remain. The total number of 
things left to choose from is « — m. Hence the probability of 
picking a thing with the property b at the next attempt is 


Then 


Piq’\q.h) = 




n -- m 


rp n-rp 


S, ^Ci 


m—l 


which is equal, by Lemma III, to 

(/ + i) (» + i) ! / (n + i) ! _ /+ I 

{n — m)\ {tn + 2 )\ / {m + i) \ {n — tn)\ wi + 2 ‘ 


(5) 

( 6 ) 
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We notice that the probability of drawing another thing with 
the property b at the next attempt depends wholly on the 
composition of the sample already drawn, and not on n. If 
m is large and / equal to tn^ this probability approaches unity, 
but never quite reaches it. 

Consider next the probability that the whole of the class 
may have the property 6. For the possibility to arise it is 
obvious that all the known instances must be ft’s; that is, 
I must equal m. In this case r = w, and 


P{pn\q-h)^ -u ” I ~r I 

_m + 1 

n + i' 


{n + i) i 


(m + i) ! (n — m) l 


(7) 


This approaches unity only when m is nearly tty that is, when 
nearly the whole of the class has been examined. It appears 
that pure sampling methods will never establish a high pro- 
bability for the proposition that the whole of the set is of one 
type. 


3 * 3 . The above analysis, which is due to Laplace, has been 
repeatedly attacked. It obviously depends fundamentally on 
the form of the function / (r), which is taken constant by 
Laplace, and represents the prior probability that a value of 
r is correct. But if our data included the proposition that the 
number of balls with the property ft in the bag is just Sy say, 
the prior probability / (r) is i for r = ^ and zero for all other 
values of r. Referring back to 3*2 (2) we see that the posterior 
probability of a given r is also i for r = ^ and zero for all other 
values of r, and is entirely unaffected by the composition of 
the sample — as we should of course expect. Also the objects 
unexamined are n — m in number, and include s — I with the 
property ft. Hence the probability that the next one examined 


has the property ft is 


s-± 
n — 


and obviously decreases as / in- 


creases. We should of course expect this; the more ft’s have 



SAMPLING 


30 

been examined the less likely are we to find one among the 
unexamined objects. But it is qualitatively different from the 
result of Laplace’s theory, which indicates that the probability 
of a 6 at the next trial increases steadily with the number of J’s 
in the sample. Obviously if we take I s the whole of the ft’s 
have already been removed and the probability that one will 
be found at the next trial is zero. 

3-31. The form of the prior probability is therefore a matter 
of great importance. Laplace’s theory has often been criti- 
cized on the ground that there is no reason to suppose that 
f (r) is constant, and therefore that the theory rests on no 
foundations whatever. But this criticism misses the whole 
point of the theory. It is an instance of that already dis- 
cussed in 2 * 4 . Either we have reasons to prefer one value of 
r to another or we have not. In the latter case / (r) is definitely 
constant and Laplace’s theory is correct. In the former case 
Laplace’s theory is simply inapplicable. The theory is in fact 
right when we have no previous knowledge of the composition 
of the class, but becomes inapplicable when we have relevant 
knowledge before we take the sample. The introduction of 
the function / (r) makes it possible to allow for previous 
knowledge. 

Though Laplace’s theory is correct in the circumstances 
specified, the cases where it is not applicable are very 
numerous and important. Suppose for instance that there 
are n balls in a well-shaken bag, that they are known to be all 
of the same size, and that we have been told that one of them 
is a cricket ball and red. Then there is a strong prior pro- 
bability that all are cricket balls and red. The only likely 
alternative is that hockey balls, which are of the same size but 
white, may be mixed with them, and we know that appliances 
for different games are usually kept separate. If then r is the 
number of red balls in the bag, / (r) is very large for r = « 
and small for all other values. 

On the other hand if we have merely observed by feeling 
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that the balls are about the size and weight of hockey or 
cricket balls, then / (r) is large for r == o and r = n and small 
for intermediate values. The extraction of a single ball then 
establishes practical certainty that all the balls are cricket 
balls or all hockey balls, as the case may be*. 

Again, suppose that the balls are known to be tennis balls 
and to be awaiting use in a match. If I of them have been 
examined and found white, the probability that the next will 
be white, on Laplace’s theory, is (/ + i)/(/ + 2 ). If / = 2 this 
is f. But the actual probability in these circumstances is 
nearly unity, since one knows by previous experience that 
only new and white balls would be used in a match. If on the 
other hand the balls belong to an ordinary player’s set towards 
the end of the summer there is a considerable prior pro- 
bability that most of them will be green, and this affects the 
posterior probability in the opposite direction. In all these 
cases the departure of / (r) from constancy materially affects 
the posterior probability. In addition we have the general 
knowledge that like things tend to be associated, as in the case 
of the cricket and hockey balls. Allowing for this we should 
expect/ (r) to be larger for r small and r nearly equal to n than 
for intermediate values. In some cases where / (r) is not 
uniform it can be evaluated and the posterior probability can 
be found completely. But the determination of /(r), when it 
is not constant, is ugually troublesome, and we shall show that 
it is also often unnecessary. 

3*4. Suppose now that the original class had numbef n, 
where n is very large, and that the number of the sample, m, 
is also large. The posterior probability that the whole class 
contained r things with the property 6 is 

fi phir) ^ 

lf{r)h{r) 

r-1 


♦ See also later, io*i. 
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where, by Lemma II, 

*« ““P (- I ’ <■) 

X = mjn ; y {n — fn)ln ; (2) 

I = m/« + p. (3) 

We are now treating / as known and r as variable ; the problem 
is the inverse of that of 3*1. The exponential is greatest when 
/> = o, that is, when 

r = ;^//m = ro, (4) 

say* Put r — fo = nd^ (5) 

so that Q measures the departure of the composition of the 
whole class from that of the sample. Then 

p = I — nfijn = {min) {tq — r) (6) 

= me. (7) 

The index of the exponential is therefore, neglecting 0 ®, 

1 I nm^O^ 

2 Tq {n — Tq) m {n — m) 2 I {m — l){n — m) 

= _ = - ^2, (8) 

say. When m is large absolutely, but small compared with w, 
is greater than 2/w, becoming equal to this minimum when 
/ = As in the case of direct sampling, therefore, the ex- 
ponential factor is insignificant except within a range of 
values of d comparable with In these conditions we may 
ignore the variation of r in the factor outside the exponential 
in (i), and simply treat h (r) as proportional to the exponen- 
tial. The probability that the true value of r lies within a given 
range is then proportional to the sum of the values of 

for values of r within that range. When m is large, h (r) 
is negligible except when 0, which measures the departure of 
the sample from fairness, is of order The whole range 
of possible variation of r makes 0 vary by unity. In these con- 
ditions, if / (r) is a function of such types as usually arise. 
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appreciable contributions to S/ (r) h (r) arise only from the 
range of values that make the exponential moderate, and 
within this range / (r) will not vary greatly. In ordinary cases 
/ (r) may be considerable for r small and r nearly equal to n, 
and may have one minimum between. Then when m is great 
we can treat f{r) as constant within the range that matters, 
and it cancels from the numerator and denominator of the 
posterior probability. The sum may now be replaced by an 
integral, and the probability that r lies between + nd^ and 

{— h^d^) dd or 

ex ^ Jit 

Hence the probability that r lies between Tq + nO^ and 
Tq + nd^ is 

II' di I i (erf - erf ^0 

= 1 (erf hd,^ — erf hOy). (9) 

This result shows that, except in cases so remarkable that 
they must be easily recognized if they arise, the actual varia- 
tion of the prior probability with r is not important provided 
that the sample is large. This is the real reason why it is un- 
necessary in most cases to evaluate the prior probability. 
Within ordinary limits its effect on the answer is negligible. 
In fact the range of values of 0 such that the truth is prac- 
tically certain to lie within it is of order To make this of 
order i per cent, we need a sample of number 10,000 or so ; 
while the only important values of / (r) come from values of 
r within a range of order i per cent, of the whole possible 
range, and in such a small range we cannot expect the varia- 
tion of / (r) to matter. In fact the large sample is necessary 
in any case, to ensure fairness ; and when so large a sample has 
been taken that fairness is assured the results will in any case 
be the same as if the prior probability was constant. 

We can now sum up the position concerning the prior 
probability in the theory of sampling. There is no theoretical 
difficulty. The opinion that there is any fundamental objec- 


jsi 
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tion to the notion of prior probability can be maintained only 
at the cost of rejecting the notion of probability, and with it 
the universally accepted opinion that it is possible to start 
from ignorance and gradually build up from experience 
methods of predicting the truth. There are practical diffi- 
culties in assessing the prior probability in many cases as they 
actually arise. This is not a situation to evade, but one to 
face. It could be dealt with in two ways: we may either 
evaluate the prior probability or swamp it. The former 
alternative is laborious and unnecessary; for in any case a 
large sample is needed to make it practically certain that the 
sample is a nearly fair one, and then the posterior probability 
of a given departure from fairness is almost the same what- 
ever the prior probability may be. We do not evaluate the 
prior probability in practical sampling because we do not 
need to ; we swamp it automatically when we take a sufficiently 
large sample. It is this principle that constitutes the theo- 
retical justification of statistical methods. 

3*5, We may now return to the problem of the baby that has 
seen only two objects, one blue and one yellow. What is the 
probability that the next object seen will be blue,? The number 
of other colours in the world may be anything; but we can 
state the issue by considering “blue or yellow” as a single 
property, as opposed to “not blue or yellow”. Then there 
are two known instances of the property “blue or yellow” 
and none of its absence. Thus in the theory of sampling 

= / = 2, and the posterior probability that the next object 
seen will be blue or yellow is (/ + i)/(w + 2 ) or f , whatever 
the total number of objects in the world may be. Since on 
the data blue and yellow are equally probable, the proba- 
bilities that the next object will be blue, yellow, or some 
other colour are respectively |, f , and J. 

It would not be legitimate to proceed by treating blue and 
yellow as single independent alternatives. If for instance we 
treated blue as one alternative and “not blue” as the other. 
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then we should have I ^ i, and the probability of a 

blue object at the next trial will be Similarly the proba- 
bility of a yellow one would be and taking the two to- 
gether we should say that the probability that the next object 
will be blue or yellow is i. This is absurd. The error is that 
in treating blue as a single alternative and applying Laplace’s 
theory we suppose all numbers of blue things in the world 
equally probable a priori \ similarly for yellow things. Thus 
we have made no allowance for the fact that it is impossible 
in the same circumstances for more than half the things in the 
world to be blue and more than half yellow ; the prior proba- 
bilities of given numbers of blue and yellow things in the 
world are not independent. 

The purpose of this trivial example is to illustrate the fact 
that allowance can actually be made in probability estimates 
for the possibility that an unforeseen alternative may arise. 


3 -* 



CHAPTER IV 


QUANTITATIVE LAWS 

'Tis a lesson you should heed, 

Try again ; 

If at first you don’t succeed, 

Try again ; 

Then your courage should appear, 

For if you will persevere 

You will conquer, never fear. 

Try again. 

William Edward Hickson 
4*1. The majority of the laws of physics are of the form 

jy * 2 , * 3 , (i) 

where j;, Xn are quantities determined by measure- 

ment, and / is a known mathematical function. Such a law 
enables us to calculate jy when the x's are known. These laws 
are established by repeated verification; it is found in 
numerous instances that the observed value oiy agrees closely 
with that calculated from the law, and on the strength of this 
verification it is asserted that the law holds in general. Super- 
ficially the generalization bears a resemblance to that involved 
in Laplace’s theory of sampling when all the objects examined 
have hitherto been of the same type, but we shall see that the 
differences are very great. For instance, we may say that the 
position of Jupiter, as calculated from the law of gravitation, 
has agreed with prediction every time it has been observed 
during a revolution. But at the best the number of veri- 
fications is finite. What is the probability that the position of 
Jupiter always agrees with the calculated value? We are here 
generalizing from a finite number of verifications to an in- 
finite class of possible instances, and if we apply the ordinary 
rules of sampling we must say that the probability is in- 
finitesimal. On the other hand any astronomer would say 
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that it is practically certain that the position of Jupiter always 
agrees with prediction — ^unless indeed he said it was abso- 
lutely certain. 

4 - 2 . The quantitative laws of physics therefore seem to be 
in a somewhat different position from the rules established 
by sampling, and further inquiry into their nature is desirable. 
Let us consider first a simple experiment. A solid of revolu- 
tion can roll down an inclined plane, and its displacement is 
observed every fifth second after it starts from rest. If we 
denote the time by t and the displacement from the starting 
point by Xy the observations are as follows : 

If (seconds) o 5 lo 15 20 25 30 

X (centimetres) o 5 20 45 80 125 180 

Then we can say that at all the instants of observation the 
displacement is connected with the time by the formula 

5^ = (2) 

On the face of it this statement is a pure description of ob- 
served facts. The phenomenalist school of critics would say 
that it is nothing more ; and many physicists think that they 
belong to this school. But the facts of observation would be 
fitted equally well if the displacement was really connected 
with the time by the formula 

5^: = + ^ (^ - 5) (^ ~ 10) {t - 15) {t ~ 20) 

(t- 2s)(t~ 3o)/(^), (3) 

where / {t) may be any function whatever that is not infinite 
at / == o, 5, 10, ... 30 seconds. The law (2) is indeed not the 
only description that fits the data ; it is only one of an infinite 
number of laws that would fit the data equally well. Its 
special quality that distinguishes it from the other possible 
laws is its simplicity. In practice no physicist, looking at the 
above data, would hesitate to say that the law (2) is the 
correct way of expressing them. But different physicists 
would disagree about their reasons for adopting it. Some 
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would say that it is a matter of strict necessity. This is just 
false; there are an infinite number of other alternatives that 
might be adopted if we chose. We want to know why there 
is only one that we would choose. Others would say that 
the simplest law is chosen for the sake of convenience. But 
a simple test would show that this is not the real reason. 
Suppose we want to know where the body was 18 seconds 
from the start. According to the law (2) it would be 64-8 cm. 
from the starting point. But according to the more general 
description (3) it might be an3rwhere, according to the value 
of f {t) for ^ = 18 seconds. But what would happen if we 
said this to a physicist.^ He would certainly say “Don’t be 
silly”. Suppose that we pressed him, and that as a result he 
was persuaded to repeat the experiment and found that the 
displacement 18 seconds from the start was 55 cm. He would 
still not abandon the form (2). He would do the whole ex- 
periment again in order to find out why had changed; 
and he would expect to find that in the new experiment the 
values of xjt^ at different times were again all equal, and 
different from the value 0*2 cm./sec.^ which he found before. 
(Further, he would find this to be so ; and he would probably 
attribute the change to an alteration in the slope of the plane. 
But that is not our present point.) In fact the physicist, 
having once found x proportional to for a wide range of 
values of t, feels a complete confidence that this rule holds 
for other values of t. This confidence could not exist if he 
had chosen the simple law merely because it was convenient. 
He must have chosen it because, of all the laws that would 
fit the data, the simplest is the most likely to be correct for 
other values of the variables. 

Let us put the matter in another way. Some physicists 
would say that the law (2) is adopted because it is observed 
to be true. But this statement is merely a mathematical pun. 
What is observed is that for / = o, 5, 10, 15, 20, 25, 30 
seconds, What is asserted is that for all values of ty 

$x = The former statement is merely a concise way of 
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rewriting the observations, a shorthand description. The latter 
is an inference from the finite number of actual observations 
to an infinite number of possible observations. To express 
both by saying simply “ 5^? = is to use the same language 
to mean two different things. In the same way, the law (3) 
applied to the observed values is definitely true; but no 
physicist would apply it to the unobserved values. In fact 
the preference for the simple law enters the question only 
when the need for making inferences arises. Convenience of 
description has nothing to do with the matter, unless we 
choose to say that '' for other than observed values 

of is a description. That is another pun; to describe an 
observation that has been made obviously does not mean the 
same thing as to describe an observation that has not been 
made. The word description is here restricted to descriptions 
of observed events ; other events are inferredy not described. 

We have seen that if we have a set of possible general laws 
and a verified consequence q whose probability on all the 
laws is the same. 


S P{Pr\h) 


r=l 

In other words, P(pr\q. h)IP (pr | h) is the same for all the 
laws. Now in our case law (2) and all the laws (3) imply the 
observed facts. Hence their posterior probabilities are in the 
same ratios as the prior probabilities.. The physicist^s con- 
fidence in the generality of the simple law in comparison 
with complex ones that fit the observations equally well must 
therefore correspond to an overwhelmingly greater prior 
probability for the simple law. Here, then, we come upon 
the essence of the problem. The prior probability of a simple 
law is so great in comparison with those of complex ones that, 
from a physicist’s point of view, the latter are not worth 
considering. It is this fundamental principle that accounts 
for the physicist’s preference for the simple law. 

The above argument is actually an understatement of the 
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situation. In the inclined plane experiment the observations 
would not, in fact, fit the simple law exactly. We might get 
a series of values like the following: 

^ (seconds) o 5 10 15 20 25 30 

X (centimetres) o 5 19 44 81 124 178 

These do not fit exactly the law or any other simple 

square law. But it would be easy to find a polynomial of the 
form 

x = aQ + a^t + 

that would fit the observations exactly. Nevertheless the 
physicist would stick to the square law. His expressed reason 
would be interesting. It would be that any set of seven 
values whatever can be represented by an expression with 
seven adjustable constants. Consequently the expression so 
obtained tells us nothing with regard to the reliability of the 
determination. The very fact that the representation is of 
such generality that it can always be made to fit the data 
exactly is considered an argument against it, not for it. With 
regard to the original square law, he would say that the ob- 
served values never differ from the calculated ones by more 
than I cm., except for the last; this differs by 2 cm., but at 
the time the velocity is 12 cm./sec., and the difference could 
be accounted for by an error in timing of 0*17 second, while 
the observations were made only to 0*2 second. In fact he 
would say that the differences never exceed the admissible 
errors of observation, and that the agreement of the observa- 
tions with the simple law is perfectly satisfactory. 

Apart from the physicist’s specified reasons, which we shall 
have occasion to consider later, we notice the outstanding fact 
about his decision. His predilection for the simple law is so 
strong that he will retain it, even when it does not fit the 
observations exactly, in spite of the existence of complex 
laws that do fit them exactly. Simplicity is a better guarantee 
of probability than accuracy of fit. The physicist would use 
the square law to predict the value of x iox t ^ 60 seconds. 
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and would expect the result to be right within a few centi- 
metres, provided the plane was long enough to permit the 
displacement required. He would, on the other hand, expect 
the polynomial of seven terms to give a seriously wrong 
answer when extrapolated to such an extent. 

The actual behaviour of physicists in always choosing in 
practice the simplest law that fits the observed facts therefore 
corresponds exactly to what would be expected if they re- 
garded the probability of making correct inferences as the 
chief determining factor in selecting a definite law out of an 
infinite number that would satisfy the observations, and if 
they considered the simplest law as having far the greatest 
prior probability. It is not explained by the reasons that are 
usually stated. 

4‘3. We may also consider the problem as one of pure theory 
of probability without considering the behaviour of the 
physicist. We return to the law 

Suppose that p is the general law whose probability we are 
considering, and that 9i, ? 2 > ••• ?n are successive verified pre- 
dictions from it. If 5^2 is implied by /), it is also implied by 
p and together. Thus we have in turn 


I ?l-?2 ••••?»• ^) = 


P{p\q i-q i 

P kn I 9l • ?2 


ffn-l'A)' 


Thus each successive verification divides the probability of 
the law by the probability of the verification on the data 
already known. Now if all the numbers of the form 

P (?« I • ?2 • — ?n ■ h) 
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were less than some proper fraction r, and p had a finite 
probability at any stage of the investigation, then a sufficient 
number of further verifications would give p a probability 
greater than unity, which is impossible. Hence we have to 
choose between two alternatives : 

(1) However often p may be verified, its probability on the 
data is never finitely different from zero. 

(2) The probabilities of the verifications, given in each 
case the previous verifications, are not all less than r if r is 
less than unity; that is, when the number of verifications 
becomes large, the probability of the next tends to unity as 
a limit. 

The first alternative plainly does not agree with ordinary 
belief. However sceptical one may be about a given law that 
is consistent with the known facts, one would consider its 
probability finite. The second alternative, on the other hand, 
agrees perfectly with our fundamental belief in the possibility 
of acquiring knowledge by experience. But it says nothing 
about the probability of the law itself, but only of verifications 
of it. It might apparently be possible to adopt the second 
alternative and still suppose the probability of the general 
law infinitesimal. 

But the construction of a satisfactory theory on such a basis 
would require a branch of mathematics that does not exist. 
Let us see whether a theory of quantitative inference can be 
constructed on the hypothesis that all general laws have the 
same prior probability. Suppose the number of such laws to 
be w, and suppose that a number of experiments have been 
made to test them. Then the only survivors are those that 
imply the results of the experiments, which may be summed 
up in the proposition q. Each of them after the experiments 
has the probability ilmP{q | h). Thus every surviving law 
has the same probability after the experiments. Also since an 
infinite number of laws satisfy any finite number of measures, 
an infinite number survive, and the posterior probability of 
each is infinitesimal. Now suppose another verification to be 
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attempted. An infinite number of results are possible, corre- 
sponding to the different laws, and each result can be ob- 
tained from an infinite number of laws. The probability of a 
given numerical result at the next trial is therefore the ratio 
of two infinite numbers; and nobody has yet succeeded in 
constructing a satisfactory mathematical theory of such ratios. 
Until it is done we shall say that it is impossible to construct 
a theory of quantitative inference on the hypothesis that all 
general laws have the same prior probability. 

4*4. Our effort to avoid the assumption that general laws 
have finite probabilities has thus led nowhere. Let us now 
make this assumption and investigate its implications. The 
number of possible laws is certainly infinite. How can an 
infinite number of mutually inconsistent laws all have finite 
probabilities? The answer to this question is provided by 
mathematics. Consider the series 


1 + + + 1 + 


The number of terms in this series is infinite, but every term 
is finite, the sum of any number of terms is less than unity, 
and the sum tends to unity as we take an increasingly large 
number of terms from the start. The assumption we need is 
therefore that the prior probabilities of possible general laws 
are the terms of a convergent series whose sum to infinity is 
unity. We have been led to it purely from the assumption 
that it is possible to construct a theory of quantitative in- 
ference; if this can be done such an assumption about the 
prior probabilities of laws must be made. Further, we see 
now that it fits in perfectly with our discussion of the relation 
of simplicity to prior probability; all we need to add is that 
the simpler the law is, the earlier its probability occurs in the 
series. Simplicity is a property that is easily recognizable 
when it is present, and we say that the order of decreasing 
simplicity among laws is also the order of decreasing prior 
probability. 
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If we make this assumption we find that there must be a 
severe restriction on the laws that are admissible at all. The 
terms of an infinite series are number*, and according 
to our rule no law whose probability is not a term of the series 
can ever be established by experience. Hence the quanti- 
tative laws capable of being established are Xq in number, and 
our problem is to specify a set of laws, Hq in number, that 
will include all laws required, or likely to be required, in 
physics. 

In one sense it might be said that the problem is trivial, 
since the number of known physical laws at any time is finite, 
and likely to remain so. Nevertheless there is a theoretical 
problem apart from the actual facts; we are concerned not 
only with what is true, but with what is possible — or rather 
with what it would be possible to establish. 

It is plain that not all functions are admissible in laws ; for 
the number of all functions is C^, which is greater than Xq . 
The same applies to the class of continuous functions. Even 
if we restrict the functions to be analytic, the number of such 
functions is still C, or 2^0, which is greater than not all 
analytic functions can occur in physical laws. If a physical 
law contains one numerical constant capable of continuous 
variation, the number of possible values of that constant is C; 
if the coefficients in the expression of an analytic function in 
a power series are restricted to be rational fractions, the 
number of functions is still C. The class of all polynomials 
of degree less than some finite number, and with rational 
fractions or algebraic numbers as coefficients, has number Nq, 
but it does not include trigonometrical functions, which do 
occur in physics, and therefore is not sufficiently general. 

Transcendental functions such as exponential, trigono- 
metric, and Bessel functions do occur in physics, but we 
may notice that they are hardly ever derived directly from 

• So is the number of positive integers, and is the smallest infinite 
number. C is the number of values of any quantity capable of continuous 
variation; and in particular is the number of real numbers. See Appendix. 
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observation. They arise first in theoretical work, and it is not 
till afterwards that it is verified that they do satisfy the results 
of observation. In the theoretical work they arise as the solu- 
tions of differential equations of finite order and degree. 

4*5. Consider then the possibility of defining a class of 
differential equations, Kq number. Clearly no numerical 
coefficient in such a class may be capable of more than Nq 
values, otherwise the hypothesis would be vitiated at the start. 
But if each equation is restricted to be of finite order and 
degree, and each coefficient in it to be capable of N© values at 
most, then the conditions are satisfied. (See later. Appendix.) 
The natural possibilities to consider for the coefficients are 
that they may be whole numbers or rational fractions. The 
latter alternative appears more general, but is not so in fact, 
for any equation with rational coefficients can be converted 
into one with integral coefficients by merely multiplying by 
the least common denominator. There is indeed a definite 
advantage in choosing the former alternative ; for an equation 
involving only integers with no common factor is equivalent 
to no other equation with the same property, whereas an 
equation with fractional coefficients is equivalent to an in- 
definite number of others with fractional coefficients. Thus 
the use of fractional coefficients would permit ambiguities in 
the arrangement of the equations in order of decreasing sim- 
plicity, which are avoided by the restriction to integers. All 
our data are therefore consistent with the following general 
principle : 

Every quantitative law can be expressed as a differential 
equation of finite order and degree y in which the numerical 
coefficients are integers. 

In the arrangement of such equations so that they corre- 
spond one by one with the positive whole numbers, we should 
begin by rationalizing each equation if it already contains 
roots. Then we should group the equations so that those 
with equal values of the sum of the order, the degree, and the 
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absolute values of the coefficients, were classed together. The 
number in each group is finite. We should then arrange the 
groups according to increasing values of this sum, and adopt 
some convention regarding the arrangement of the equations 
in the same group. Thus the equations occurring early in the 
series would have low order and degree, and the numerical 
coefficients in them would be small integers. They would 
therefore be simple^ as the term is generally understood. We 
may indeed give a precise definition of the complexity of an 
equation by saying that it is the sum of the order, the degree, 
and the absolute value of the coefficients. If the complexity 
is thus defined, it is a determinate mathematical problem to 
say how many differential equations have complexity less 
than or equal to n. But it is difficult. When n is large the 
number is certainly larger than 2”; I have not obtained a 
closer estimate. This, however, is enough for present pur- 

00 

poses. The series S i /« does not converge. Hence the total 

n«=l 

probability of the laws of complexity n must decrease 
faster than i/«, and any one individually must have a prior 
probability less than 2“”/n. 

It may be objected that some of the arbitrary constants in- 
volved in the solutions of the differential equations of physics 
are capable of continuous variation within definite ranges, 
and that therefore the true number of solutions is C, and we 
are no further forward. The reply is that the differential 
form, not the integrated one, is the fundamental physical 
law. The arbitrary constants, so called by mathematicians, 
are not arbitrary in physics; they are determined by the 
boundary conditions, and it seems that these conditions, in 
their fundamental forms, involve no more arbitrariness than 
the differential equations. 

It was also objected, when this suggestion was first made 
in a slightly different form, that the restriction to differential 
equations was inconsistent with the ideas about the quantum 
theory that prevailed in 1921. My own view at the time was 
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that the orbits in an atom, in a stationary state, are describ- 
able by differential equations of the usual type (this was then 
the current opinion) and that the quantum jumps, involving 
discontinuous changes of velocity, should be regarded as 
boundary conditions. But there have been many quantum 
theories since then. Those of Heisenberg and Dirac appear 
to have replaced both the ultimate differential equations and 
the conditions of the quantum jumps by finite difference 
equations; and there is no objection to supposing that the 
ultimate laws are finite difference equations, for these may 
equally well be restricted to a class Nq in number. On the 
other hand Schrodinger’s theory makes a single differential 
equation account for everything, and is entirely consistent 
with the postulate as it stood. 

4*51. I do not wish, therefore, to maintain that this form 
of the simplicity postulate is necessarily the final one. I do 
maintain, however, that a postulate restricting the number 
of admissible laws to Nq is necessary, and that the prior 
probabilities must decrease rapidly with decreasing simplicity. 
Modifications of the present form may be needed to admit 
such systems as Dirac’s; also in the laws that appear in the 
general theory of relativity, and indeed in elasticity, the 
simplicity of the symmetry relations may compensate for 
the large numbers of terms in the equations. Meanwhile 
the present form will serve our purposes. 

Our everyday ideas on this subject, as in most others, are 
a complicated system based in part on experience and in 
part on principles believed independently of experience. 
The latter we call a priori. To disentangle the latter we 
have to argue backwards, just as in logic the discovery of 
the primitive postulates was subsequent to a great develop- 
ment of mathematics by forward reasoning. By analysing 
the processes involved in our forward scientific reasoning 
we detect the fundamental postulate that it is possible to 
learn from experience. This is a primitive postulate, pre- 
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sumably on the frontiers between a priori and empirical 
knowledge. The status of the laws of probability and the 
simplicity postulate is that of inferences from this principle. 

4‘6. The variables in the differential or difference equations 
include the time and the co-ordinates of position. These are 
still generally believed capable of continuous variation. But 
these are not real variables, but apparent variables. When we 
assert, for instance, Laplace’s equation 

02J7 dW _ 

0 “^ ay 

we are not implying a choice among an infinite set of laws 
in any one of which for instance, may have a value chosen 
from a continuous set of possibilities. We assert that for all 
values of Xy jy, z corresponding to points outside matter this 
differential equation is satisfied by the potential. In the 
language of mathematical logic this equation should be written 
as follows. 

Whatever P,x,y,ZyV may be , if F is the gravitation potential 
at P, a point outside matter with co-ordinates (^,j, z)y then 
02F d^v dW _ 

dx^ dy^ dz^ 

When a symbol is given all its possible values, and the 
differential equation is asserted for all of them, the symbol is 
only an apparent variable. There is no objection to apparent 
variables being capable of continuous variation ; what matters 
is the form of the law, not the actual values of the variables 
in particular verifications. Similarly Poisson’s equation at 
points inside matter could be written as follows. 

Whatever P^x^y^z^p^ V may be, if V is the gravitation 
potential and p the density at P, a point with co-ordinates 
\xyyy z)y thcrc is a constant / such that 

dW , dW , dW . 

dx^ dy^ dz^ ~ 
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There is no difficulty, similarly, about the fact that density 
and mass may appear in our equations and are apparently 
capable of continuous variation ; for we assert the laws for all 
their values, and they are only apparent variables. 

4*7. The question of the probability to be attached to a quan- 
titative inference can now be dealt with. If p is the most 
probable law on the data at any stage, and q an additional 
experimental fact, we have 

P {p\ q.h) ^ P{q\p.h) P{p\h) 
P{^p\q.h) P{q\~p.h)P{^p\hy 

By the hypothesis we have just made about the prior pro- 
babilities of laws, P {p I A)/P p \h) is not very small. If q 
be implied by />, we have P {q\p ,h) = i, while if the con- 
tradictory of p gives no particular inference about the truth 
oi qy P {q \ ^ p • h) may be very small, especially if q involves 
accurate measurement. Hence, even if p has not a very large 
probability already, a single verification of a consequence 
not predicted by its contrary may make 

P(p\q,h)IP{^p\q.h) 

enormous, and therefore the posterior probability of p is 
nearly i . In such circumstances the probability of a further 
inference q^ from the law is practically that of the law itself, 
since the second tferm in the equation 

P{q^\q.h) = P{p\q.h)P{q^\p.q.h) 

+ P{~P \q.h)P{q^\~p .q.h) 

is the product of two small factors. In such inference, then, 
there is no advantage to be gained by proceeding directly 
from the data to the further inference rather than by way of 
the general law, as has sometimes been suggested. 

It will be noticed that the argument in the last paragraph 
depends on the smallness oi P {q \ ^ p , h). If, however, '^p 
involves a moderately probable law which also leads to q as 
a ccmsequence, this probability will not be small, and the 


jsi 


4 



QUANTITATIVE LAWS 


50 

probability of p after the verification will stand to that of this 
alternative law in almost the same ratio as before. A new 
crucial test will be required to decide the issue between them. 

If we have to decide between different simple laws, the 
prior probabilities of which are in any case moderate, the 
high posterior probability of a law arises from its verification. 
If a law/>i implies that the measure of a length will be between 
157 and 15*8 cm., and this is found to be true, then there is 
no posterior probability for a law that said it would be 
45*0 cm. and very little jfor one that said that it might be any- 
thing from zero to a metre. So long as two laws are not 
widely separated in the order of simplicity, the decision be- 
tween them rests on the quantitative tests and not on the 
prior probability. 

But when the laws are widely separated in the order of in- 
creasing complexity the prior probability is all-important, 
even when the known facts would fit either. An important 
case that has arisen in practice is that of small variations in 
the numerical constants in fundamental laws. Suppose that 
a law contains a numerical constant 2, and that we propose to 
alter this constant to Then in accordance with our 

principle that a law must be cleared of fractions before it is 
placed in the order of descending probability, this law will 
now have to be treated as if it contained numerical coefficients 
running into millions, and its position in the series will be 
millions, probably billions, of places later than before. Its 
prior probability is accordingly insignificant. In fact a small 
change in a numerical coefficient is not a trivial matter; from 
the point of view of the prior probability of the law it is the 
most drastic change that can be made. As an example, we 
may consider the inverse square law of force in electrostatics, 
in which the index has been shown experimentally to be — 2 
within Yii-GTi- Then the only law within the admissible range 
that has an appreciable prior probability is the exact inverse 
square one, and it is unnecessary to consider any others. 
Similarly, we could discard at sight the suggestion that the 
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perihelion of Mercury could be explained if the attraction of 
the sun varied inversely as the 2*000,000,016 power of the 
distance instead of as the exact inverse square. The exiguous 
prior probability of such a law puts it beyond consideration, 
apart from the inconsistency with the observed motion of 
the moon’s perigee that led to its abandonment. In fact the 
law established with a high probability by experience is not 
an approximation to the simple law, but the exact simple law 
itself. Consequently extrapolation over an indefinitely wide 
range can be carried out with the full probability of the law. 
This is the justification of the inferences concerning con- 
ditions at the centre of the earth or millions of years ago 
that form so large a part of geophysics and cosmogony. 

The rapidity with which an exact quantitative law can be 
established depends, then, first on its being sufficiently simple 
to have a moderate prior probability, and second, on its 
power to make precise predictions that can be tested. Subject 
to these two conditions a theory of quantitative inference 
can be constructed that will fully explain the confidence that 
physicists show in their predictions. 


4-2 



CHAPTER V 


ERRORS 

A snapper-up of unconsidered trifles. 

Shakespeare, A Winter's Tale 

5*1. We saw that when the physicist found that the dis- 
placement of his solid down the inclined plane varied nearly 
as the square of the time from the start he would adopt the 
exact square law as a statement of the facts, in spite of the 
existence of more complicated laws that would fit the ob- 
servations exactly; and we have shown how this procedure 
can be justified on the basis of the low prior probability of 
complicated laws, which renders them unreliable for the 
purpose of inference. Nevertheless he might not allow the 
matter to end there. He might seek for explanations of the 
departures of the observed values from those calculated from 
the law. In some sense the square law is true; but the 
quantities that satisfy it are not quite the data of observation. 
The physicist would say that it was impossible to measure 
the time absolutely accurately, because the watch could not 
be read to less than a fifth of a second ; there was also some 
possibility of inaccuracy in measuring the position of a 
moving object; the watch and the position of the solid were 
not observed at precisely the same instant, since some time 
would elapse in looking from one to the other; and possibly 
the slope of the plane was not exactly uniform. Having re- 
duced his observations he would say with confidence that the 
acceleration of a body of the actual form rolling down a uni- 
formly inclined plane was constant ; this would be his general 
law for the experiment, which he could extend to bodies of 
different design and to planes with different slopes. He would 
on the other hand recognize that exact verification of the 
law would require conditions not realized in the actual 
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experiment, and that the departures from the law had some 
explanation. 

The physicist’s attitude to observations is not the naif 
realism attributed to him by some philosophers, which would 
make every observation a perfect statement of a fact about the 
real world. It is essentially a critical realism. There is a 
belief that there are true values of the quantities that he sets 
out to measure, but it is not believed that the observed values 
are anything but an approximation to these true values, 
which are in the last resort unknowable. The differences 
between the true and observed values are called errors. 

In practice, not knowing the true values, we compromise. 
When we have a number of observations of one or more 
variables, a simple law is found to fit them approximately. 
The case of a single measurement carried out several times 
may be brought under this head, the law involved being 
merely one of constancy with regard to the time. The law 
may involve some parameters not known already, and it will 
usually be impossible, however these parameters are chosen, 
to make the law fit all the observations exactly. But we can 
choose them so as to fit the observations as closely as possible, 
though it is largely a matter of convention what criterion we 
adopt to measure the closeness of the fit. When we have chosen 
one such criterion the parameters in the law and the values 
of the function are unique. We call these the adopted values. 
In general they will differ from the true values, but will be 
nearer to them than the observed values. The differences 
between the adopted and observed values are called residuals. 
The differences between the true and adopted values are the 
errors of the adopted values. 

In general the procedure may be summed up as follows. 
The observed values are found; they exist because they are 
measured, and there is nothing more to be said. A simple 
law is found to fit them approximately. This is a statement 
of fact. Then by a conventional process we find adopted 
values close to the observed values that fit the law exactly. 
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So far as the convention is at our disposal the adopted values 
have some arbitrariness, but with a given convention they 
are unique. The adopted values therefore exist. The existence 
of the true values, however, is a postulate, the validity of 
which will have to be examined. We notice at present that 
the observed values are more fundamental in experience 
than the simple law, that the simple law is more fundamental 
than the adopted values, and that the whole process of finding 
the adopted values could be carried out equally well if there 
were no such things as true values. 

Provisionally we shall assume that the true values exist, 
that the exact simple law refers to certain specifiable condi- 
tions, and that the errors arise from the fact that the actual 
conditions of the experiment differ to some extent from these 
ideal ones. The practical justification for this assumption is 
that it is actually found that the more closely these condi- 
tions are realized the more accurately the simple law fits the 
observations, though it never fits them exactly. The ideal 
conditions always reduce to the removal of unconsidered vari- 
ables. Thus in the problem of the rolling solid we should con- 
struct the plane so as to have as nearly uniform a slope as 
possible, and we should substitute electrical recording devices 
to record the time and displacement simultaneously instead 
of relying on eye observations. The aim is to make the time 
the only independent variable, and thereby to remove varia- 
tions of the displacement that may be due to variations in 
anything but the time. It is here that causality enters: if 
when y is kept constant, x =f{t)y and if when y varies x 
differs from / (t), the changes in x are said to be caused by 
the changes in y. This is the practical definition of causality. 
In the actual experiment the errors are said to be caused by 
the unconsidered disturbing factors. 

The most fundamental type of error is inaccuracy of 
measurement. Observed values are never capable of taking 
all values of a compact set ; in making a measurement we read 
the instrument to the nearest multiple of a certain constant, 
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which we call the step of the instrument. Thus in measuring 
the position of a mark on a scale we may read to the nearest 
hundredth of a centimetre; in observing an instant of time 
we give it to, say, the nearest fifth of a second. 

5*2. The possible observed values in the one case are mul- 
tiples of a hundredth of a centimetre, in the other, of a fifth 
of a second. The true value is not in general a possible ob- 
served value, since the true values of most variable quantities 
vary continuously. Hence there is an error of observation 
equal to the difference between the true value and the nearest 
observable value, which may in an extreme case be half the 
step of the instrument. Such an error, for a given true value, 
is systematic; that is, it is always the same however often we 
repeat the measurement. 

6*21. Suppose next that we wish to measure a length by 
means of a scale. Take the step of the measuring scale as the 
unit of length and suppose that the true length is n + Xy 
where nisz. whole number and x is between ± The object 
is placed on the scale in an arbitrary position, and the posi- 
tions of its ends are read. One end is at wz -f yy where m is an 
integer and y is between ± J. Then the other end is at 
m n + X + y. The position of the first end is then read in 
any case as m units. That of the other is read as w + w — i if 
X y is less than — \y zs m -{■ n ii x y is between — \ and 
■f i, and zsm n i if a: + y is greater than xis fixed, 
but y is equally likely to have any value from — | to + J. 
Thus the probability of a value of y between y^ and y2 is 
^2 “■ • We see that the measured length will be 

n — i if y < — I — Xy 

n if’-^ — x<y<^-’Xy 

n -h I if i — X < y. 

If X is positive the first alternative cannot arise, since y 
cannot be less than — for the second, the range of values 
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of y is from — J to ^ or i — ^ in all; for the third, the 
range is x. Hence if x is positive the probability of an ob- 
served length equal to n units is i — and that of one equal 
to n + I units is x. Similarly if x is negative the probability 
of an observed length equal to n units is i + and that of 
one of « — I units is — x. In each case the possible measured 
lengths are the two multiples of the step adjacent to the true 
length. 


6*3. Now consider the case where a large number of in- 
dependent contributory causes affect the observed value. 
Suppose that the error $ is given by 

^ == “f + ••• + ^n^ny (l) 

where ej, € 2 * ••• vary independently. Suppose that 

the probability that in any given trial e,. will lie in a given 
range is (e,.) . Then the probability of a set of values 

within ranges t/ej, ^/e 2 > ••• is 

(«i) -E'a (€ 2 ) • . • (««) ( 2 ) 

We require the probability that ^ shall lie in a range to fg • 
This is 

^ ~ JJf ••• J^l (^1) -^*2 (^2) ••• (^n) d€i ... d^^, (3) 

where the range of integration is such that all values of each 
variable are permitted, subject to 

ll< + ••• + «n€n< l 2 * (4) 


Now Heaviside’s unit function H (i), which is equal to o for 
i negative and i for | positive, is given by* 



C + lOO pK( 

- dK. 

C~ lOO ^ 


(5) 


Also (6) 

and otherwise = o. Then 


/ 


I 

Zm 


•c + ioo 1*00 

roo 

C~i00 j — 00 

* J-00 K 


El (ei) (e^) 
... En (€„) dK dei... d€„, 


( 7 ) 


• Jeffreys, Operational Methods in Mathematical Physics^ 1927. 
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where the c’s may now range over all real values independently. 
Now put 

[" E, (€,) der = n, (cirK). ( 8 ) 

J -00 

Then 

T fC+ too y 

Jc- 00 ^ *" 

... (fl^/c) rf/c. (9) 

Now replace by ^ and ^2 by ^ , and put 


Q (k) == Q.I (a^/c) Q2 (^2^) ••• (10) 

Then /, the probability that ^ lies in a given range becomes 

P (i) d^y where 

j rc + too 

= i (II) 

.^TTC J c_ too 

Now 

QriO)-^ffrOEr(e,)d,r= J”^(l + 0^, + ...)£,(€,) 

00 Qk 

= I+^2^j^r)b, (12) 

roo 

where 5,* = e/ (e^) dsr. (13) 


Now form log ( 0 ), so that 

logo, ( 61 ) = S (14) 

fc-1 

n 00 ^ k i^k 00 y^k 

Then logQ(/c)= S 2 = S P, (15) 

r = lA: = l ^ • A:-l 

and J>«).^J”'%xp(-aJ+ (.6) 

So far nothing has been assumed about the quantities s^k 
except that (e^) decreases for large absolute values of 
with sufficient rapidity to make the various integrals and 
series converge. We can, however, make all the % zero by a 
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change of variable. For if this relation is not already satis- 
fied we take a new e/ equal to e, — then 

f (fr) (fr “ ^rl) d^r = 0 , (17) 

J -00 

1*00 

since (c,) rfe, = i, (i8) 

J -CO 

it being certain that €y lies between ± oo. If then we use e/ 
instead of the new is zero. Then p^i and Pj are o. Also 
we define the mean square or standard value of 6^', by 


2 _ 
r 



(19) 


We need no longer write accents, all component errors 
being supposed transformed in this way. We see that is 
always positive; by convention we give o-,. the positive sign. 
Then ^ 

Pr2 ~ ^2 “ ^ 

+ (21) 

27rtjc-ioo V k-3 

The integral is in a form suitable for evaluation by the method 
of steepest descents. If we omit the terms with /c > 3, there 
is a saddle point where 

K-^^y ( 22 ) 

and the path of steepest descent is parallel to the imaginary 
axis, since Pg is real and positive. Hence the integral reduces 

P (f) = ( 277 Pj)-^ exp (- (23) 

Appreciable contributions to the integral arise only for values 
of /c — ^/Pg of order at most. 

In most ordinary cases (e,.) = o or is insignificant for 
values of much greater than . Then is of order ; 
so is prjc ; and 

Pfc = S Ur^prjc = O I S (tfr 0 ^r )4 > 

r-1 (r-1 ; 


(24) 
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and is of order naf^c^ if the are all comparable. Within 
the range where exp (— is appreciable, then, 

= O (naV) ± (^)*|*, (25) 

and if ^ is not large compared with P^ this is 
O («aV) (P2)-i*= = O (naM)/(Ka V)-i» = O (26) 

If then n is large, P^k^ is small throughout the neighbourhood 
of the saddle point for all values of ^ > 2 ; and then (23) is a close 
approximation to the true value of P {$). Further, it makes 

r p($)d^=i, (27) 

J -’a 

nearly, where a is a moderate multiple of (2P2)^ ; and therefore 
values of | outside the range ± a have an insignificant 
probability*. 

We notice that the proof depends for its validity on the 
condition that when k > 3, P* is small compared with (P2)^*> or 

n ^ / n \ Jfc 

2 is small compared with ( 2 ) . (28) 

r = 1 Vr =» 1 / 

If all the UfOr are equal or comparable, and n is large, this is 
true. But if one of them, iov r = m say, is so large as to con- 
tribute the greater portion of P2, then Pg is of order 
Pjc is nearly and P* is of the same order as (P2)^*. 

In such a case the normal law breaks down; it is indeed 

* This discussion is taken mainly from Whittaker and Robinson’s 
Calculus of Observations y which gives references to earlier writers. It has 
been modified by the introduction of Heaviside’s unit function and the 
method of steepest descents. The consideration (27) is part of the proof. 
Our argument shows that in the specified conditions the terms in Pj^ for 
> 3 do not matter on the path through the saddle point considered, by 
the usual considerations involved in the method. But if ^ was large com- 
pared with our equation (26) would not hold. Then, however, we 

can use the fact that the total probability of all values of ^ is i , and (27) 
shows that nearly all of it arises from such values as do make the approxima- 
tion valid ; when the probability is appreciable our result is correct, and 
when it is inappreciable our result is still correct. 



6o 


ERRORS 


obvious that then aJP{a^€j^ is nearly But when 

the contributions to Pg ^^'ise in comparable amounts from a 
large number of the component errors the condition is true, 
and the normal law holds. 

Suppose that a large number of the are of comparable 
importance, that is, that the probable range of variation of 
is of the same order of magnitude for all of them, and that 
the others give smaller contributions to Then it is shown 

that the probability that ^ lies in a range is P (i) d^y where 

(29) 

and A is a constant called the modulus of precision. This is 
called the normal law of errors. The probability of an error 
between o and | is 

fp(^) = I erf (30) 

Jo 

When is equal to 0*477, erf The corresponding 

value of I, equal to 0*477/^, is called tht probable error \ it has 
the property that the error is as likely to fall short of it as to 
exceed it. The mean square or standard error is defined by 

The probable error is 0*674 times the standard error. 

6 * 31 . There has been much discussion about the validity of 
the normal law of error. It on the whole follows the same 
lines as that associated with Laplace’s theory of sampling: 
just as it is doubted whether there is any reason to believe 
Laplace’s assumption that all compositions of the original 
class are equally probable, so it is doubted whether there is 
any reason to believe that errors are actually distributed 
according to the normal law. The solution in both cases 
seems to be much the same. If certain conditions are satisfied 
the normal law is definitely right ; in other cases it is definitely 
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untrue. We have already had two simple cases where it is 
untrue. When an observation is made to the nearest multiple 
3f the step of the instrument the error is the difference between 
the true value and that multiple, and is always the same. 
When a length or an interval of time is measured as the 
difference between two measures each made to the nearest 
multiple of the step, the possible observed values are the two 
nearest multiples of the step, and no others. In each case 
the normal law is simply inapplicable. But when the error 
arises as the resultant of a large number of independent errors 
of comparable importance the normal law is right. Two such 
cases are common. 

5*32. Suppose that we make several observations of the 
same kind, of number «, and that we take the mean. Then 
each observation is liable to an error of the same magnitude, 
and the standard value of each is the same. The mean is 
ijn times the sum of the individual errors, so that each in 
the foregoing discussion is i/w, and 



The conditions for the validity of the normal law hold if n is 
large. If for instance we consider the measure of a length, 
when the step is unity, and the true value is an integer + , 

where is positive, (e,.) = o unless is either — x^ or 
1 — Xr. The probability that is — is i — Xj.; the pro- 

bability that is i — Xr is Xr . Then (er) = o unless 

the range a to b includes either — Xr or i — Xr; if it does 
include — Xr the integral is i — if it includes i — Xr the 
integral is jc,., however short the range may be*. 

Then 

— [ Ej. (e,.) 

j - 00 

= (l — Xr) Xf^ + *, (l — Xf)^ = Xr{l — Xr). (z) 

* Stieltjes integrals are understood. Cf. Hobson, Theory of Functions 
nf a Real Variable ^ 1, 507. 
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When n is large and the are regularly distributed from 
o to I, 

= n X {\ — x) dxy nearly, 

Jo 

= i«. (3) 

provided n is large enough for the theory of sampling to be 
applicable. Then 

= (4) 

Strictly speaking the possible values of the mean are all 
multiples of the step divided by w, but this gives no trouble 
provided that we consider only the probabilities of errors 
within ranges greater than i jn. 

This theory is not applicable if the same length is measured 
several times, for then is always the same and a function 
of X, ranging, by (2), from o for a; = o or i to J for 
The condition that the errors must be independent is then 
not satisfied. We notice that in this case 

/■oo 

E (e) = — (l “ Jc) .V + ^ (i — .x?) = O. (5) 

j - 00 

6-33. Another case where the normal law appears to hold is 
one where considerable attention has been given to possible 
sources of error and all the most serious ones have been 
traced, as in many astronomical observations. The remaining 
ones then probably contain several just below the limit of 
what can be detected individually, and the normal law will 
hold approximately. 

6*34. The normal law is true unless there are one or a few 
sources of error of sufficient importance to dominate all the 
rest. But if there is a single main source of error we should 
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Still consider its probable distribution. It may be one of the 
types already considered, arising from the step of the in- 
strument. If so its properties may be considered known. It 
may be the result of a definite mistake on the part of the 
observer, as when an astronomer observing a meridian transit 
makes a miscount of a second. Criteria for detecting such 
mistakes are needed ; at present we notice only that they are 
capable of giving errors of certain discrete values, which are 
multiples of the step. Other factors not allowed for may have 
similar properties ; that is, they affect only a small fraction of 
the observations, but when they do arise they give errors 
larger than are usual. Such errors may be capable of only 
one sign; thus the astronomer may occasionally count too 
few seconds, but never too many. 

There may on the other hand be a single source of error 
capable of giving many different values. There may for in- 
stance be an unknown periodic disturbance. The practical 
solution here is that the periodic character of the residuals is 
noticed, and its amount can be determined by harmonic 
analysis and allowed for ; its cause then becomes a matter for 
independent inquiry. Such a case arose in the discovery by 
Chandler of the 14-monthly and annual terms in the variation 
of latitude. But such individual sources of error may have 
many different distributions of probability; and in practice 
the issue is very lil^e that of assessing the distribution of prior 
probability in the theory of sampling. We start from a state 
of ignorance such that all observed values of the variable are 
equally probable. By experience we build up knowledge that 
the observed values are concentrated in a short range about 
the value given by a simple law, and by studying all our 
previous knowledge about modes of distribution of errors 
we could, given sufficient trouble, assess the probabilities of 
given errors. But the effort would be more trouble than it is 
worth. In practice it is better to take a sufficiently large 
number of observations to make the posterior probability 
practically independent of the prior probability. 
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5A. In practice we are not much interested in the errors as 
such, except in so far as they may show a systematic character 
that may repay special investigation. What we want is the 
true value, and if we cannot find it, we want to choose an 
adopted value as near as possible to it. That is, given the 
observed values, we wish to assess the probabilities that the 
true values may lie in various ranges. The problem therefore 
becomes one of inverse probability, and the prior probabilities 
of different true values must be taken into account. 

5*41. Consider first the case of a single reading made to the 
nearest multiple of the step; the observed value is «, where 
the step is the unit. The true value is w + x. Then the prior 
probability that x may lie within a range is proportional to 
the length of the range ; if P (x) dx is the prior probability 
that X lies within a range dxy P (x) is a constant. The pro- 
bability of getting the reading n is i when x is between ± | 
and zero when x is outside that range, for then another in- 
tegral value would be read. Hence the posterior probability 
that X lies within a range dx is 

P (x) dx , 1 7 1 1 1 

^ ' = dx when - t < ^ < 

P (;c) dx 

J - 00 

P (x) dx .0 , . 

^ J ==0 when ^ — f . 



Thus after the observation, or any number of such observa- 
tions, the posterior probability of x is uniformly distributed 
between ± 

6*42. Consider next a length or a time interval determined 
by difference. The observed values are / equal to n and tn 
equal ton+ i . The true value is n + and P (^) is constant. 
For a given x the probability of a reading n is i — \ x \ when 
X is between ± i and otherwise zero; the probability of a 
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reading n+i is i--|i-~ic| when i — x is between ± i 
and otherwise zero. If then x was negative the readings n + i 
would not arise ; if x was greater than i the readings n would 
not arise. For o < x < i , the probability of / readings equal 
to n and m equal to n + i is (i — a:)* x^. The posterior 
probability that x lies in a range dx is therefore o for jc < o 
or ^ > I , and when o < jc < i is 


P (jc) dx ^+"^€1 (i — xy x^ _ 
j P (x) dx (i — xy x'^ I 
__ (i — xy x^ ^ ! 


B(l-h /I ml 

The coefficient of dx is a maximum when 


{i—xYx'^dx. (i) 


m 

^ = ri — > 
I -V m 


i?) 


so that the mean of the observations is the most probable 
value. Calling this value jcq, we find easily that when I and 
m are large the posterior probability is proportional to 

(/ + 


exp 


2lm 


{x - XqY 


dx. 


Thus if we take ^0 the adopted value, the probabilities of 
diflFerent true values are distributed about ^0 according to the 
normal law, with a ^standard deviation given by 

We notice the advantage of this method over direct reading. 
When a single quantity has to be measured as the nearest 
multiple of the step, the same observation may be made an 
indefinite number of times without in the least aflFecting the 
precision of the adopted value. But when it is determined by 
difference and the measure is repeated a large number of 
times, the standard difference between the adopted and true 
values may be reduced indefinitely. 


JSI 


5 
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5 ‘ 43 . When the normal law of error applies, we proceed as 
follows. The true value being now taken as x, and the ob- 
served value as A? + then the probability of an observed 

value in a range is d^. The probability of a set of 

V TT 

errors in ranges about is then 

( Aj” exp {- + $,^ + ...+ L^)} ...d$„. (i) 

But actually both x and h are initially unknown, and we are 
trying to find x from the observed values. Calling these 
Xi, ATg, ... x^^ we have 

^1 = ^1-^, = (2) 

d^i = = dxr,. (3) 

If the prior probability that x and h lie simultaneously in 
ranges dx^ dh is P (jc, h) dxdh^ the posterior probability that 
they lie in these ranges is 

P{^x^h)h^tx^[—h^{{xi — xY~\- {xn~xy)'\dxdh . 

\ P(x, h) exp [— . . . + {Xn—xY}'\dxdh 
J -00 J 0 

the factor dx^dx^ ... dx^ being the same for all values of 
X and A. 

As usual the posterior probability depends on the prior 
probability. In most cases the prior probability of x is nearly 
uniformly distributed, at any rate over a range several times 
that covered by the observations. We are initially prepared 
for values of x over a wide range, and the purpose of making 
observations at all is to permit a considerable reduction of 
this range. The position is different with regard to A. Initially 
we may have no special views about the probability of one 
value of A rather than another, but we do at least know that 
negative values are excluded, since they would imply negative 
probabilities. Again, x is not usually in fact a number; it is 
usually a length or an interval of time, and A is a reciprocal 
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of whatever kind of magnitude x is, while the standard error 
a is the same kind of quantity as x. There seems to be no 
special reason for measuring the precision in terms of h 
rather than o-, and their product is constant, so that 

d log A + d log a = o. (5) 

If then P (x, h) dh is proportional to dhjh or rfa/cr, an ambiguity 
is removed. It means that the probability of a value of a or A 
within a definite range is proportional to the increase of its 
logarithm ; if Aj/Ag = A3/A4 , A is as likely to lie between Aj and 
Ag as between A3 and A4. The probability of a value of A 
within any range is then independent of any scale of measure- 
ment ; it is distributed in the same way among different values 
whatever our units. If any other function of A was chosen 
we should be assigning a definite prior probability to a value 
of A less than a certain quantity, and this would put a particular 
value of a physical quantity in a privileged position a priori. 
In many cases, then, it seems reasonable to take P{x^h) 
proportional to i/A. 

This is not, however, quite a complete statement, because 

TOO 

it makes P{x^h)dh diverge at both limits. To make this 

integral equal to i we should therefore have to include a zero 
factor unless very small and very large values of A are ex- 
cluded. This does appear to be the case. We choose the length 
of our scale so that all the measures will be included within it 
easily ; that is, all the important values of A are large compared 
with the reciprocal of the length of the scale. Again, if the 
scatter of the observations is comparable with the step of the 
scale, the finiteness of the step is a dominant source of error 
and the normal law does not apply at all. We are therefore 
restricted to a range of values of A that make a large compared 
with the step of the scale and small compared with the length 
of the scale. The range of admissible values of log A is now 
large but finite, and within this range we may suppose their 
prior probabilities distributed uniformly except near the ends. 


5-2 
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We now introduce the mean value, defined by 

= *1 + *2 + ••• + *n . (6) 

and write 

- * = (^1 - Xo) + (:Vo - *)> (7) 

and so on ; then 

(afj — x)^ + (x2 — xy + ... + (x„ — xy 

= (*1 — JCo)* + (*2 — *0)* — *0)® + ”(* — *o)*- 

( 8 ) 

The quantities *1 — ato and so on are the residuals, ^1' say, 
and *0 — * is the error of the mean value. Then 

1 00 ^9) 

and if we denote the posterior probability of values of x and 
h in the range dxdh by I {x^ h) dxdh we have 

I{x, h) = ("-)* (10) 

w exp [- S ^'2] dh 

Jo 

Put (ii) 

so that <7' is the standard residual. We have 


A"-2 exp (- a***) dh = n{i(n- 3)}, (12) 

and 


/(x,h) 






/i^-^exp[-nk^{or'^+(x~Xo)% (13) 


the large values of h making an inappreciable contribution in 
any case, and the small ones if w > i. I {Xyh) does not break 
up into two factors, one a function of x and the other of A, 
so that it would not be correct to speak, on the data, of the 
probabilities of given values of x-- Xq and h separately. 
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The probability of a value of Jtr in the range dx, irrespective 
of h, is 

dx^ I (x, h) dh 

“ = 2 f mi - 2 )} . 

W) n {|(« - 3)} 2 [n {o'* + (* - 

_ J[_ lyi (« — 2)} a'"~^ , . , 

Vn n {f (n - 3)} (ey'2 + > v^ 4 ) 

SO that the posterior probability of x is not distributed 
according to the normal law. 

But if n is large, and x — Xq small compared with o', 

+ (* - aco)*}*" = o'" exp (15) 

nearly, and the probability of a given value of x is propor- 
tional to exp {— n(x— X(,y/2a'^}. The mean value is in any 
case the most probable ; in this case the probabilities of the 
true values are distributed about it according to the normal 
law with a standard deviation a'jVn. Subject to the same 
condition we can put (^i: — a;,)* equal to its standard value 
a'^jn in I {x, h ) ; then 

/ (*, h) QC exp nAV* ” 

= exp {- [n + i) AV*} . (16) 

This is now independent of x, and may be taken to give the 
distribution of probability of h. It is a maximum when 


AV*=i”- ^ 
2 « + I 


(17) 


so that the most probable value of h is nearly ijVza'. Near 
this value of h, say, the probabilities are distributed nearly 
according to the law 

I (ac, h) oc exp {— 2 (« + i) <7* (A - hoY). (18) 
The standard deviation of A is (m + i)~ij2a'. 
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But we must remember that it is only in a rough sense that 
we can speak of the posterior probabilities of values of x and 
h separately even when n is large. If we put {x — x^^ equal 
to o, corresponding to the most probable value of x— Xqj 
instead of to its standard value, the resulting probabilities of 
h would be somewhat differently distributed. The most 
probable value of h and its standard deviation are strictly 
functions of x — Xq. 

6*5. The most commonly quoted proof of the normal law of 
error is that of Gauss, which appears to show that if the mean 
is the most probable value the errors must follow the normal 
law. A case has arisen above where the mean is the most 
probable value and the errors do not follow the normal law. 
It is therefore desirable to reconsider Gauss’s argument and 
see where the difference has entered. He proceeds by as- 
suming that the true value is Xy and that the probability of an 
observation within a range dxi about Xi is (/> (xi — x) dx ^ . 
Then the probability of a set in the ranges dx^ydx^y ... dx^ is 

{xi — x)(j> (^2 -- oc) (j> (Xn — x) dxidx2 ... dx ^ . (i) 

Given the observed values, then, the probability of a value 
of X is proportional to 

<f) {Xi — x)(f> (^2 ”” ••• 0 (^n “ dXy (2) 

if the prior probability of x is uniformly distributed. This is 
a maximum for variations in x if 

^ log 4 >{xi-x) + ~^- log 0 {x.^ -x) + ... 

+ ^log^(jc„-x) = o. (3) 

But the postulate that the mean value is the most probable 
says that this condition must be equivalent to 

(^1 - :v) -f- (^^2 - x) ... + {Xn - x) = 0, (4) 
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for all values of the differences, and therefore 
d d d 

2 ^\og^{xi-x) j-\og^{Xi-x) ^\og<j>{x^-x) 

v_ V V V * * * V v (5) 

*^2 vV iA'2 w tA> ,j iA» 

= 2h\ ( 6 ) 

say, since each ratio is the same and therefore cannot vary 
with X. Integrating we find that 

^ {xx — x) az exp {— {xi — ^)}2, (7) 

which is the normal law of error. 

This mistake is in the equation (i), which supposes that 
the probability of getting all the observations ... Xn is 

the product of the probabilities of each observation separately. 
It supposes, that is, that when the observations XiyX2y ... x^^i 
have been made the probability that Xn will have a certain 
value is just what it was at the start. It therefore constitutes 
another contradiction of the principle that it is possible to 
learn from experience. If the early observations are found 
to have a small scatter, the next will be expected to be near 
them; if they have a large scatter we shall correspondingly 
expect the next to deviate considerably from the mean of 
those already made. If they all repeat one of two constant 
values, we shall expect the next to have one of those values. 
Gauss’s proof is in fact valid if we know beforehand all about 
the distribution of the probability of error ; it is inapplicable 
when it is from the observations themselves that we are 
trying to find this distribution. 

When we say that “the normal law holds” we mean that 
there are true values of x and h such that the errors satisfy 
the normal law. In the usual practical case the possible 
values of x and h are scattered over a wide range, the normal 
law holding for each pair of values. If we try to assess the 
total prior probability of an observed value x^ for a given Xy 
by adding up the contributions for all values of A, the 
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result is not of the normal form ; if P (x, h) is proportional 
to i/A, the prior probability of a given f is proportional to 

foo 

which is not proportional to for any value of h^. 

Similarly the posterior probabilities are not of the normal 
form, even when the normal law holds. It is the component 
probability from each pair of values of x and h that is referred 
to when we speak of the normal law of error ; any attempt to 
compound probabilities destroys the normal form. 

5*6. In addition to errors with probabilities following the 
normal law and those arising from the step of the instrument 
many other types exist. The probability of a given distribu- 
tion of error, before the observations are taken, is in each case 
quite definite, but involves taking into account the whole of 
our previous knowledge about what distributions of error 
have occurred in the past. Its calculation would be over- 
whelmingly laborious, and the effect on the result would in 
most practical cases not be worth while. If there is no strong 
and obvious reason to expect any particular law of error in a 
given case, there is no better plan than to take a large number 
of observations and draw a smooth curve to represent the 
frequency of their departures from some convenient standard 
value. But the question arises, what in this case is the most 
probable value? The answer will depend on the circum- 
stances. If the observations show a strong tendency to 
collect about two definite values, that fact is evidence that the 
errors arise from some disturbing factor with a finite step, 
and we cannot do better than to take the arithmetic mean. 
They may be approximately symmetrically distributed about 
the mean value ; in that case also, if there is no previous reason 
to expect the errors to be predominantly of one sign, we may 
take the mean value as the most probable. But it may turn 
out that their distribution is noticeably asymmetrical. The 
observations on one side of the mean may be few, but with 
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large deviations, while those on the other side are many with 
small deviations. In that case the placing of the most prob- 
able value with respect to the mean requires either assessment 
of the prior probability or special examination of the actual 
causes of the errors. If neither is carried out an uncertainty 
about the position of the most probable value necessarily 
remains. Three alternatives are usually considered in such a 
case: the arithmetic mean, the mode, and the median. The 
median is defined by the condition that as many observations 
exceed it as fall short of it ; the mode is such that the number 
of observed values for a given range is greatest there. In 
general with asymmetrical distributions all three are different. 
The median would be the most probable if a positive error is 
as likely as a negative one, irrespective of their magnitudes ; 
the arithmetic mean may be the most probable if the magni- 
tudes of the errors matter. Both alternatives may arise in 
different cases. The mode is the most probable if there is 
some reason to expect that the errors arise from special 
causes not present at all in the majority of the observations. 
The use of the mean as the adopted value has the advantage 
that it makes the standard deviation a minimum. The median 
has the advantage that we can divide the observed values, in 
order of magnitude, into four classes, each containing as 
nearly as possible the same number of observations. The 
median comes at the boundary between the two middle 
classes, while the extremes of the two middle classes specify 
a range such that a given observation is as likely to lie within 
it as outside it. In this sense such a classification determines 
a probable error; or rather two probable errors, one for 
positive and the other for negative errors. In no case will the 
probable error of a single observation, that of the adopted 
value, nor the mean square error, follow the same quanti- 
tative rules as have been determined for cases where the 
normal law holds. 

If the only purpose of the observations is to determine a 
single quantity as accurately as possible, and the errors turn 



ERRORS 


74 

out to be asymmetrically distributed, there seems to be 
nothing to do but to consider which of the conditions for the 
arithmetic mean, the median, and the mode is the most 
likely to be applicable in the given case, and to choose the 
adopted value accordingly. A method often considered in 
such a case is to attempt to allow for the terms in P3 , P4 . . . 
and so on in 5*3 (21). Thus 

( 00 P, T 

^ ^ A:1 ap) 2m exp (- J Pg/c*) dK 

= P (^), subject merely to a convergency condition, 
so that we can write 


PiO = (i+ s (- 

\ A:»3 






The expansion can then be carried out ; the terms are known 
functions of ^ with adjustable coefficients involving the P^ . 
By an extension of the method used for finding the posterior 
probabilities of values of x and h when the normal law holds, 
we can now use the distribution of the observations to find 
both P2 and the higher P^ as closely as possible, and still to 
estimate the distribution of the posterior probability among 
various values of x. But it seems to me that such a procedure 
can lead nowhere. The normal law is valid as it stands when 
there are a large number of component errors of comparable 
magnitude. If it requires modification, it is because there 
are one or a few sources of error of predominating import- 
ance, and the law of error is determined mainly by these. If 
the extended law is applied it will only lead back to the law of 
the dominant component error, whatever that may be; and 
if the observations cannot determine that directly no modi- 
fication of the normal law will do so, for the condition for a 
few terms of the series to give an approximation to the whole 
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is not satisfied, all the terms in fact being of the same order 
of magnitude. 

5'7. Another warning is needed with regard to the quantities 
^,- 1 . If these do not vanish, the error that we have shown in 
certain cases to follow the normal law does not arise directly 
from the actual component errors e,., but from their differ- 
ences from their associated In fact is not S but 

= S = S — S ^ — S a^Sri . 

n 

The observed value is ^ that is, ^ + S UrS^i + where 

r- 1 

I' follows the normal law. Then however many observations 
we may use to determine our mean, the quantity that has the 
mean for its most probable value is not Xy the true value, but 

n 

S We can never find the true value from this with- 

r = 1 n 

out some knowledge of the sum S a^Sy,^y which affects every 

r- 1 

observation equally. It is usual to call such a constant error 
the systematic error, while the deviations that do satisfy 
the normal law are called accidental errors. 

If now the probability of error is asymmetrically dis- 
tributed, that means that errors of one sign are likely to be 
more frequent or larger than those with the other sign; in 
either case a systematic error is to be expected. In any case, 
that is, where the distribution is asymmetrical, the existence 
of a systematic error may be inferred. But it may exist also 
where the distribution is symmetrical. In either case the 
observations give us no means of evaluating it; this can be 
done only by way of other considerations. 

At first glance the problem of systematic error seems to 
stultify our whole procedure; for it means that, however 
many observations we may take, the difference between the 
adopted value and the true value remains unknown. Yet we 
still have our assurance that the observed values nearly satis- 
fied the physical law under test; the adopted value cannot be 
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far wrong. At the worst we could make it a convention to take 
the mean as the adopted value in the case of a quantity known 
to be nearly constant ; or we could always find the parameters 
in a law by the method of least squares. The true value in any 
case does not differ much from the adopted value ; the question 
atissueis much it is likely to differ. This reduces to the task 

of evaluating the systematic error, which is in any case small, 
of the order, for instance, of the difference between the mean 
and the median. We may attempt to do this from previous 
knowledge, by measuring other variables directly and allowing 
for them ; or we may determine the quantity under considera- 
tion by means of other laws that involve it and may give 
different systematic errors, and then compare the results. 
The differences may indicate the nature and extent of the 
systematic errors and suggest means of tracing them to their 
causes. In fact systematic errors are, and always will be, the 
curse of the present and the hope of the future. 

5*71. We may be interested in the arithmetic mean for 
other reasons ; for instance, it is wanted directly in the evalua- 
tion of an integral. Suppose that the true value is Xy and the 
error of an observation Let the probability of an error 
between $ and $ + he E (|) di. We take 

r E{i)di==i; r E{i)idi = s; 

J - CO J - 00 

r E{i){^-syd^ = a% (i) 

J - 00 

so that [ E {$) + 5*. ( 2 ) 

J - 00 

Now consider a set of observations , n in number, and their 
mean t 

(3) 

Then the probability that the mean would be in a range d$Q is 

L exp {- A* (^^0 - •y)®} ^ 0 . (4) 

Vtt 
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provided the conditions of 5*3 are applicable. The standard 
error is the same for each observation, since all are made in 
the same conditions, and they are independent. Hence, if 
oTo is the standard error of the mean, 

= S “ . 0-2 = - ; = i/2(To^. (5) 

The conditions required hold provided that n is large. The 
probabilities of the mean value are therefore distributed 
about X + s according to the normal law even if those of the 
original observations are not. Further, 

r-=l r-1 

Of the terms on the right, the second is zero. The first can 
be found from the observations and denoted by where 
a is the standard deviation. The last may be zero, but we 
may suppose, seeing that the standard value of — s)^ is a® 
and that of (^0 ■” is a^/w, that the ratio of the corresponding 
sums is n : I. This is an approximation, which will sometimes 
exceed and sometimes fall short of the truth. Then we can take 



na^ = 

(7) 

or 

2 ” '2 2 

<7^ = O’ ^ I CTn — . 

< n - I “ n — 1 

( 8 ) 


This approximation is subject to the same sort of uncer- 
tainty as arose in dealing with the determination of the 
standard error when the normal law is satisfied. We have 
obtained in this way an estimate of the standard error of the 
arithmetic mean when the actual law of error is not the 
normal one. 

5*72. It often happens that the quantity sought can be found 
from several different types of data. The mean distance of 
the sun, for instance, may be found from observations of 
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the transit of Venus, observations of Mars or an asteroid near 
opposition, from the moon’s parallactic inequality, or from 
the aberration of light. Suppose that the true value is and 
that we have several methods of measurement. In the rth 
method the probability, given of an error between and 
is -Edr) ^ single observation. Denote an 

individual observation in the rth series by ^^8 » suppose 

that there are such observations. Consider the sum 

^ == S S = 2 flrUrBr + S S - ^r) 

r 3 r r 8 

= ^0 + f. (I) 

say, where the a,, are constants. Then in certain conditions 
the probability that ^ will lie in a given range is P (^) d^. 


where 

p(i) = 

V 77 

(2) 

and = i ; 

0-2 = 2 2 == S 

(3) 


r 8 r 


When all the observations are equal we want ^ to have the 
same value. Hence we take 

S S a,. = S n^ar = I. (4) 


Otherwise the ar are at our disposal. 

Suppose that we want to make cr^ as small as possible. We 
introduce a multiplier A and say that 



\d(j^ = 2 n,. — A) da^ = 0 , 

r 

(5) 

for all dur , 

if A is chosen suitably. Hence 



A ^ V 

(6) 

and finally 

a, = a,-yS 

/ r (^r 

(7) 


a* = 2 = a = i/s \ 

r 1 r 

(8) 
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But the (7y are the standard errors of x — Sr as found from the 
separate methods; if is the observed standard deviation 
in each and the standard error of the mean value we have 
nearly a 


'2 


n — 1 


'ro 


( 9 ) 


If then we determine the standard error of each mean value 
as in 5-71 we have 


I 


<7 


2 


S 



(10) 


The conditions for the validity of the normal law of error are 
that the individual errors shall be independent, which they 
are; and that the largest contributions to from the in- 
dividual errors shall arise in comparable amounts from a 
large number of them and not from only a few. The latter 
condition is satisfied by the terms in arising from a single 
series of observations, and a fortiori from those from all the 
series together. The probabilities of errors in weighted means 
derived from several series of observations therefore satisfy 
the normal law ; and the standard error can be computed from 
the means and standard errors of the separate series by the 
same methods as are applicable if the probabilities of error 
in each series are distributed according to the normal law. 

The practice of weighting the means from the separate 
series according to the inverse squares of their standard errors 
is open to some objection, because it neglects the question of 
systematic error. The quantity that follows the normal law is 
not the actual error of the final mean, but this error less by 


^nrarSr=-2-—l~^2- (”) 

r 

To make the error of the final mean small we want not only 
to make its standard accidental error as small as possible, but 
also to reduce as far as we can its systematic error. To choose 
the ar as in (7) achieves the first object; but there is little 
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ground for supposing that the same choice is suitable for the 
second object. In particular the , as we have chosen them, 
depend on the number of observations in the series; the 
systematic errors do not. To put the matter in another way, 
the means derived from the separate series in general differ. 
The differences arise partly from the fact that the different 
methods give different systematic errors, and partly from 
accidental errors. The latter can be reduced indefinitely by 
taking enough observations, but no number of observations 
will reduce the systematic errors. If the means differ by 
amounts large compared with their standard errors, it is fair 
to infer that the differences arise from systematic error, and 
the weights assigned are illusory. If we have previous reason 
to expect systematic error from any method, its amount may 
be inferred from the differences between the mean given by 
that method and those given by the others. If all the methods 
are initially equally likely to have systematic errors of a given 
amount, we should take a simple unweighted mean, at any 
rate until the causes of the outstanding discrepancies have 
been investigated. 

6'8. There is a common type of error, which arises from the 
co-operation of a large number of causes of comparable im- 
portance, together with one or a few that affect only a small 
fraction of the observations, but produce large errors when 
they do occur. One such example has been mentioned already, 
when an astronomer observing a transit makes a miscount of 
a second. Such a cause implies an incompleteness in the 
normal law of error and therefore casts doubt on the adoption 
of the arithmetic mean as the most probable value. We require 
a criterion for recognizing observations so affected when they 
occur. An absolute criterion is impossible, for a deviation of 
any magnitude is theoretically possible, even when the normal 
law applies and the standard error is known already. But 
errors greater than a moderate multiple of the standard error 
are so rare that we may say that when they arise they probably 
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come from some unusual cause. If so, we shall be justified in 
rejecting them and determining the adopted value and the 
standard error of the adopted value from the others. This 
course will sometimes be mistaken, because they may really 
arise as the large errors to be expected occasionally from the 
normal law itself; and if the normal law is applicable to the 
whole of the observations the most probable value is the 
arithmetic mean, and the mean after rejecting an observation 
is not the most probable value. 

In the circumstances we are considering the error ^ is of 
the form + I2 » where follows the normal law. The pro- 
bability of value of ^2 follows the law 


£*2 (^ 2 ) ^^2 


m 


m 


when ^2 = 0 within (i) 


foo J 

£2 (^2) ^^2 = ~ when a range about ^2 = o is excluded, (2) 

j -00 ^ 

m E2 (^2) ^2^ ^^2 = <^2^ with the same restriction. (3) 

J - 00 


Then the error ^2 arises in only ijm of the cases, but if its 
standard value is found from the cases when it can arise it 
much exceeds aj , the standard error of those observations that 
do follow the normal law. 

In practice m and the form of (^2) are initially unknown. 
The question is \Yhether, from a given set of observations, 
we can infer with considerable posterior probability that m is 
finite, and that one or more of the observations have been 
affected by the error ^2 • If so we are justified in rejecting 
them. Suppose then that we have n observations, that the 
largest residual is and that the standard error as computed 
from the whole of the observations is cr. Then if ^ is a fairly 
large multiple of a the probability of getting one observation out 
of n in the range is nearly (277)”^ o-“^ exp{—^^l2(j^)d^ if the 
normal law and this value of a are correct. Consider now the 
probability of an error in this range from some law other than 
the normal one. The aggregate of all such laws must be con- 
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sidered. It is plain that the chief contribution will come from 
those with tn of the same order as n ; for if m was much less 
than n we should expect a large fraction of the observations 
to be affected, while if m was much greater than n it would be 
unlikely that any would. Similarly the chief contribution will 
come from the values of of the same order as For given 
m and <72 the probability of an error in a range is of order 
</^/4m<72. The prior probabilities that m and <72 lie within the 
requisite ranges may be taken to be fractions, but not very 
small ones; let us say Then the prior probability of an 
error in the actual range, derived by way of such laws, is of 

order ^ or of The numerical coefficient is ob- 

64 ma^ 04 nf 

viously capable of great variation. The prior probability of 
the normal law being of order unity, we can say that the ratio 
of the posterior probability that the error ^2 has contributed 
to f to the probability that it has not, is of order 

64^1 n 30^2 ^ 

roughly. This gives a workable criterion. If this ratio is 
greater than i, we may reject the observation; if it is less than 
I, we should retain it. Otherwise, our observation may be 

rejected if js greater than We have the following 

values : 




30s 


1 

2 

3 

4 

5 


0- 03 
0-12 

1- OO 

25 

1800 


The question of rejecting an observation therefore does not 
arise unless ^/a is over 3 ; if we have 5 observations we may 
reject an observation with ^jcr greater than 4; if we have 40 
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observations we may reject one with ^ja greater than 5 ; but 
then the function increases so rapidly that with any practicable 
number of observations we should reject those with f/a 
greater than 6, and the inaccuracy of the coefficient is a 
matter of trivial importance. 

A common astronomical practice is to reject automatically 
observations with residuals greater than 5 times the probable 
error, or 3-4 times the standard error, and to reject those 
with residuals greater than 3 times the probable error, or 
2*0 times the standard error, if there is any intrinsic ground 
for doubting those particular observations. From the above 
considerations it appears that these rules are somewhat too 
stringent; 5 times and 3 times the standard error instead of 
the probable error would be better. 


6-2 



CHAPTER VI 


PHYSICAL MAGNITUDES^ 

Multiplication is vexation ; 

Division’s just as bad ; 

The Rule of Three perplexes me. 

And Practice drives me mad. 

Nursery Rhyme 

6*1. The fundamental notion of any quantitative science is 
number. In its most elementary form this means the number 
of a class, and depends on the notion of the cardinal com- 
parison of classes. Two classes of objects are said to be similar 
if their members can be arranged in pairs, one from each class, 
so that to every member of the one class corresponds one of 
the other, and none are left over. If such a correspondence is 
not possible the classes are not similar. Then any two similar 
classes have something in common, which is not shared by 
any class not similar to them. This property we call their 
number. All propositions about number are really propositions 
about the comparison of classes. In the works of Russell and 
Whitehead the definition is made apparently more precise by 
defining the number of a class as the class of all classes similar 
to the given class ; this class being the same whatever one of 
the similar classes we begin with, all the classes have on this 
definition obviously the same number. But it might appear 
that on this definition the creation of a new class (a new set of 
ten things, for instance) makes some change in the class of all 
similar classes, and we cannot allow the number of a known 
class to be changed by such an event. Actually, however, 
Whitehead and Russell, in their Principia Mathematical do 
not use this definition in practice; for they never explicitly 
use the notion of a class at all. They proceed by attaching a 

* For a great many of the ideas in this chapter I am indebted to Dr 
N. R. Campbell’s Physics: The Elements ^ though I do not agree with all 
he says. Cf. Phil. Mag. 46, 1923, 1021-1025. 
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meaning to every proposition about the class, or the class of 
classes, which can be understood in terms entirely of more 
elementary ideas, but a class as such is never defined. From 
a physical point of view there seems to be no harm in sup- 
posing directly that classes exist and that similar classes 
have a common property, which we call their number. The 
advantages of the method of Whitehead and Russell are that 
it makes it possible to give a meaning to any proposition about 
numbers whether classes actually exist or not, and that it 
avoids the logical difficulties associated with the theory of 
types ; but for our purposes these appear to be unnecessary 
refinements*. 

From the notion of number we can proceed to those of the 
sum and product of two numbers. If two classes have no 
common member, and we form the class of the two together, 
the number of this class is called the sum of the numbers of 
the original classes. If we form the class of all possible pairs 
of members of the two classes, the number of this class is 
called the product of those of the original classes. If a class 
has no member its number is called o. If when a is a member 
of a class any member of the class is identical with the 
number of the class is called i. If a unit class is combined 
with a different unit class, the resulting class is said to have 
number 2 , and so on. In this way the finite whole numbers 
can be defined, and their arithmetic can then be developed. 

6-11. Number is an abstraction. When classes are similar 
in terms of our method of comparison of classes, member 
to member, we say that they have a common property, 
which we call their number. We say in fact that they 
have the same number, which is different from the number of 
any class not so comparable with them. If in whatever way 
the members of two classes are paired off there are always 
still some members of one left over when those of the other are 

^ Cf. Wittgenstein, Tractatus Logico-Mathematicus ; F. P. Ramsey, 
Proc. Lond, Math. Soc. 25 , 1926, 338-384. 
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exhausted, the class with the unpaired members is said to have 
the greater number, the other to have the smaller number. The 
observed fact is the result of the comparison ; the property com- 
mon to similar classes is an abstract idea derived from it. This 
derivation by abstraction is a logical step, and is of extremely 
wide application. We experience a similar sensation from the 
sight of blood, a brick, a sunset, and a Canadian apple; we 
abstract a common property, which we call redness, and which 
is not possessed by the midday sky, a lemon, or a tablecloth. 
All qualifying adjectives depend for their meaning on such 
processes, of different complexity in different cases. In such 
an expression as “ten men’*, “ten” is not an adjective 
qualifying “men”; this is seen at once if we try to attribute 
a meaning to “a ten man”. “Ten” here qualifies a class of 
men; “ten men” really means “every man in a ten class of 
men”. Sometimes, when objects are classified in terms of 
some method of comparison, the classes can be arranged in 
some definite order suggested by the method of comparison 
itself; thus we attach meanings not only to the statement that 
classes have the same number, but to the statement that one 
class has a greater or smaller number than another, and this 
makes it possible to arrange numbers in a definite order. This 
is the fundamental requirement of a physical magnitude. It 
is not possessed by all abstractions. For instance, we can 
classify objects according to the colour-sensation they give. 
But there is no direct reason suggested by our method of 
comparing objects according to colour to indicate what should 
be the order of arrangement of red, yellow, and brown. For 
this reason colour is not a physical magnitude. In the case 
of the pitch of a note, we can say directly from sensation that 
one note is higher or lower than another, and all pitches can 
be arranged in a single series based on this comparison. Some- 
thing more is needed, however, before we can measure pitch. 
The existence of an order is necessary to measurement, but 
other conditions must be satisfied before we can make a 
quantitative determination. 
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6-2. The quantities capable of being measured directly are 
called fundamental magnitudes. Their character can be 
shown by considering one of the most important, namely 
length. When two objects can be placed so that they are in 
contact at both ends, we find by experiment that calipers or 
compasses adjusted so that they fit one object will also fit the 
other. Objects can then be classified together if they fit the 
calipers when the latter are kept in the same adjustment. 
We abstract the common property, which we call the length 
of the objects. But the method of comparison by juxta- 
position of the objects, either directly or by way of the 
calipers, suggests a way of arranging them in order. If the 
calipers have to be set to a greater angle to fit one object than 
another, we say that the first has the greater length; our 
method of comparison not only gives a meaning to length, 
but arranges different lengths in order, so that any length is 
greater than any that precedes it in the order and less than 
any that follows it. Thus length, so far, is on the same footing 
as the pitch of a note. But there is a difference. 

Consider the method of construction of a millimetre scale. 
A long screw is fixed so that it can turn in a bearing with a 
screw thread inside it. Whatever part of the screw is within 
the bearing, it fits. Every turn of the screw fits any turn of 
the bearing. In terms of our method of comparison, every 
turn of either therefore has the same length. In the manu- 
facture of the scale, it is arranged that whenever the screw 
advances through a complete turn a device attached to it rules 
a transverse line on the scale. The object whose ends are two 
consecutive scale-divisions is therefore compared directly 
with the turn of the screw, which is known to have always the 
same length. Hence by the very definition of length every 
interval between consecutive divisions on the scale has the 
same length. When we measure a length we place the ends of 
the object in contact with the scale, or we apply calipers to 
the ends and apply the calipers to the scale ; and we count the 
scale-intervals between the ends. The statement that the 



88 


PHYSICAL MAGNITUDES 


length of an object is 153 mm. means then that the object 
has the same length as the object formed by placing 153 
scale-intervals end to end, all the intervals by construction 
having the same length as the turn of a certain standard screw. 
We see now the difference between a length and the pitch of 
a note. When we put two objects together end to end along 
a scale we get a new object determined by the extreme ends ; 
we say that in terms of our method of measurement, merely 
by counting scale-intervals, the combined object has a mea- 
sure equal to the sum of those of the separate objects. But 
if we sound two notes of different pitches we do not get a 
single note of a new pitch. If the notes are sounded together 
we get a chord; if they are sounded in succession they give 
two distinct notes. 

We can now specify in what conditions a property can be 
a fundamental magnitude. It must be possible to construct a 
scale such that every interval of the scale is the same in 
respect of that property, the test of being the same being 
comparison with some definite standard by the process that 
enables us to recognize differences in that property. The in- 
tervals must be consecutive, and the object must be measured 
by counting the number of intervals that it overlaps. When 
this is done the measure of the property is a fundamental 
magnitude. It has the property that if two objects of measures 
X and y are placed consecutively, the measure between the 
extremes is -f- y . 

Length is a fundamental magnitude. Angle, as measured 
by a protractor graduated in degrees, is another, for each 
degree-interval is compared with a standard length in the con- 
struction of the instrument. Time, or rather the interval of 
time required for a given process, is another. It is measured 
by counting the swings of a pendulum or a balance wheel, 
which occur in a definite order, so that each has an immediate 
successor, and this order is recognized directly by sight or 
sound. If two different mechanisms once take the same time 
to perform an oscillation, they do so again when compared 
again. When two processes, started at the same instant, also 
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end at the same instant, we classify them together and 
abstract the common property of the interval of time taken. 
We measure this by counting the number of oscillations of a 
standard instrument, say a seconds’ pendulum, the balance 
wheel of a watch, or a tuning-fork, that take place during 
either process. By its essential structure the interval is there- 
fore a fundamental magnitude. It is important that the in- 
terval is independent of the actual instant when the process 
starts, just as a length measured on a scale as the number of 
intervals overlapped by the object is independent of the 
position on the scale of the end first placed in position. Time 
may also be measured in terms of the rotation of the earth; 
the interval taken by the earth to turn through a standard 
angle is taken as the step, and any interval is measured as the 
number of times the earth has turned through this angle 
during the process. Angle being a fundamental magnitude, 
interval of time as measured in terms of it is another. 

Mass, as found from a balance, is another fundamental 
magnitude. The bodies we call our “weights” are con- 
structed so that they all counterbalance the same body on the 
other pan; and we can recognize when a body more than 
counterbalances, or fails to counterbalance, a body on the 
other pan. We classify together bodies that counterbalance 
the same body, and abstract the property of mass. If then a 
body counterbalances the same body as is counterbalanced by 
n of our standard weights, we say that its mass is n in terms 
of these weights. The number n is obtained directly by 
counting, and is evidently a fundamental magnitude. 

6*3. Every fundamental magnitude is measured in terms of 
a certain property of its own kind, which we call the step 
of the instrument. In the case of number the step is the 
number i. In the case of length, it is the length of the in- 
terval on the scale, or ultimately that of a turn of the fixed 
screw thread. For time, it is the interval between instants 
when the pendulum, balance wheel, or tuning fork passes its 
equilibrium position. For mass, it is the mass of the standard 
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weight. In all cases but number itself the standard is to a 
large extent at our disposal. In the case of length, for in- 
stance, instead of using a millimetre scale we might use a 
scale depending on a different screw thread, giving a scale 
divided into tenths of inches. The numbers obtained by 
measuring the same object on the two scales are different; 
the standard therefore matters. But we can compare different 
standards. Thus we find that an object measured in terms of 
a millimetre scale overlaps 254 intervals; measured in terms 
of a tenth-inch scale it overlaps 100 intervals. If we like we 
can test one scale against the other directly. The length of 
100 intervals on the tenth-inch scale is then the same as the 
length of 254 intervals on the millimetre scale; it is the 
common property revealed by the method of comparison. 
Now such stretches on a scale may be placed end to end, and 
by the additive property of fundamental magnitudes it 
follows that if an object has the same length as loo^c intervals 
on a tenth-inch scale, where x is any whole number, it also 
has the same length as 254^ intervals on a millimetre scale. 
If we consider an object that covers 10 intervals on a tenth- 
inch scale, we cannot say immediately that it will cover 25*4 
intervals on a millimetre scale, because so far we have 
attached no meaning to fractions of a scale-interval. Strictly 
speaking, the measures of length we have considered arise 
when the object exactly stretches from one scale-division to 
another. We cannot say at once that an object covers 25-4 
intervals ; but we can say that it covers more than 25 and less 
than 26 intervals. This must be so, for ten such objects 
placed end to end will cover 254 intervals on the millimetre 
scale. If each covered 25 intervals all ten would cover just 
250 intervals; if each covered 26 intervals, the ten would 
cover 260 intervals. The question therefore arises, when an 
object has not the same length as an exact number of in- 
tervals on the scale, can we assign to it a measure? Evidently 
we can, in two different ways. We can read to the nearest 
whole number of scale-intervals. In that case we have to say 
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that, while 10 tenth-inch intervals and 25 millimetre intervals 
have the same length, and lengths have the additive property, 
100 tenth-inch intervals and 250 millimetre intervals have 
not the same length. There is an apparent inconsistency, 
which can be removed only by recognizing that we are not 
dealing with a logical process, but with a physical law. We 
must admit the principle that our measures are liable to 
errors, arising in this case from the finite step of the instru- 
ment. The measure “25 millimetre intervals’* is an approxi- 
mation to the true length, not the actual length. The additive 
property of lengths, in fact, is a physical law. So long as we 
are dealing with exact multiples of the scale-interval its truth 
is merely a matter of counting. But when we have recognized 
that every object has a length and that most objects do not in 
fact cover an exact number of scale-intervals, we have to 
choose between the additive law and the adoption of an exact 
number for a measure. The additive law being a simple one, 
we therefore retain it as expressing the relation that holds 
between the true values, as defined in the last chapter, and 
regard departures from it as errors. In the case just con- 
sidered, the measure of 25 scale-intervals has an error. The 
measure of ten similar lengths together is the same as that of 
254 millimetre intervals ; we retain the additive law and say 
that, since there is a length in each case, its measure can only 
be 25*4 millimetre intervals. This is the true value. When the 
length of one object is given as that of 25 millimetre intervals, 
we say that it has an error; if there are, as here, other means 
of fixing the true value, we say that the error is — 0*4 interval. 
Length is not a mere matter of counting; fractions must be 
admitted. 

We can proceed as follows in finding a length. Suppose 
that the object to be measured is placed repeatedly against the 
scale, so that in each application the first end comes where the 
second end was in the previous application. In this way we, 
effectively, construct a new scale. At the mth application the 
total length overlapped is greater than that of r and less than 
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that of s scale-intervals ; that is, we classify the whole numbers 
into two divisions, such that m applications of the object 
cover more of the scale than the number of intervals given 
by a number in the first division, and less of it than the 
number of intervals given by any number in the second 
division. If then we are to retain the additive property of 
length, we must say that m times the length of the object is 
greater than that of any number of intervals in the first class, 
and less than that of any number of intervals in the second 
class. Therefore the measure of the length of the object must 
be greater than that of rjm intervals and less than that of sIm 
intervals ; and the greatest value of r is less by i than the least 
value of s. The measure is therefore specified within a frac- 
tion i/m of a scale-interval. By varying m we can then find a 
series of intervals, each of which must contain the true value; 
alternatively, we divide the rational fractions into two sets, 
such that the number in the measure exceeds all in the first 
set and falls short of any in the second set. The true measure 
may then be any value iDetween the largest in the first set and 
the smallest in the second. If m could be indefinitely large 
in practice this procedure would specify a cut in the rational 
fractions and define a real number. Actually there is a limit 
to the length of the measuring scale, and a certain amount 
of arbitrariness survives. It might be true, as far as we can 
tell, that every length can be associated with a number of 
scale-intervals expressed by a rational number. 

But this principle becomes untenable when we consider 
more complicated laws. We find for instance that the square 
of the hypotenuse of an isosceles right-angled triangle must 
be twice the square of either side. If the measure of the side 
can be associated with a rational number, the only number 
that can be associated with the hypotenuse is \/2 times that 
number, and \/2 times a rational fraction cannot be rational. 
We need more numbers than rational fractions to keep our 
laws formally true. But if we admit all real numbers there is 
no difficulty. 
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The true length of an object corresponds then to a real 
number of scale-intervals. Now suppose that the object is 
compared with two different scales. The associated numbers 
are I and /'. Another object is compared with the same scales, 
giving numbers m and m\ Then we must have 

I 

For suppose that we place the objects in steps along the 
scales, the first being repeated /> times and the second q times, 
and consider the length of the object specified by going from 
the last mark on the first new scale to the last on the second. 
This has a length, on the first scale, equal to pi — qm\\i this 
is negative it means that we have to go backwards. On the 
second scale we get similarly/)/' — qtn\ But if Ijm and /'/m' 
were unequal we could find such values of p and q that qjp 
would lie between them. Then pi — qm and pi — qrn! would 
have opposite signs and we should have to go in opposite 
directions in the two cases to get from the end of one derived 
scale to the end of the other. But the objects specified by re- 
peating the first p times and the second q times have definite 
lengths ; the greater length will be the greater length whatever 
scale is used. Hence pi — qm and pi — qm' must always have 
the same sign. Therefore Ijm = Ijm' , or Ijl == mjm'. The 
numbers associated with any length on two scales are in a 
fixed ratio depending on the scales and not on the object itself. 

6*4. Starting from the sensory notion of comparison of ob- 
jects by juxtaposition, we have obtained the notion of length 
as a property by abstraction, and have shown how any length 
may be associated with a real number in relation to a certain 
scale. We can now proceed to the notion of length as a 
quantity. As a property of an object it is identified by a state- 
ment of the form ‘‘the length of the given object, in com- 
parison with a millimetre scale, is specified by the number x'\ 
We write this in the form “the length of the object is x 
millimetres On the face of it this statement is an abbrevia- 
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tion, and can be understood only by reference to the previous 
one and to the whole of the foregoing discussion. We have 
nowhere said what we mean by “a millimetre’’ as a noun, 
much less what we mean by millimetres”. We might 
mean “the length of one interval on a millimetre scale”. 
But length is a property, and we do not know what we mean 
by multiplying a property by x. We might attempt to re- 
analyse the statement by saying that millimetres” has 
a structure analogous to that of “ten men”. Then it would 
have to mean “a class of millimetres, whose number is x'\ 
But clearly a class of lengths is not the same thing as any 
single length, even in the case where is a whole number; 
and if ^ is a fraction there is no such thing as a class of 
number x. 

There seem to be two possible attitudes to the statement 
“the length of the given object is x millimetres”. We can 
take it as simply an abbreviation ; if so there is nothing further 
to be said. But we may consider that “ a millimetre ” is some- 
thing that exists and can be freely multiplied by real numbers 
to give other things of the same kind as itself. If so, “length” 
in this statement is no longer a property of the object, x times 
a property cannot in any sense be a property of the same kind. 
“Length” is now a new concept, called a quantity. There is 
no logical necessity for the existence of quantities; but for 
practical convenience of statement they are useful. The 
fundamental postulate of the theory of quantities is : 

If the measure of a quantity is Xy and the quantity is multi- 
plied by the number y, we obtain a new quantity of the same 
kind whose measure is ocy. 

On our first analysis this is equivalent to : 

If a property of an object, in comparison with a certain 
scale, is associated with the number x, and the property of 
the scale-interval on the first scale, when compared with a 
second scale, is associated with the number jy, then when the 
property of the object is compared with the second scale it is 
associated with the number ocy. 
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Here the measure of an interval of the first scale in terms 
of the second is jy, and we have obtained a measure of x in- 
tervals of the first scale as equivalent to xy intervals of the 
second. 

This proposition is true for fundamental magnitudes in 
consequence of 6*3. The importance of the expression of it 
in terms of quantities may be illustrated by reference to 
length. Suppose an object is measured in terms of a tenth- 
inch scale and that the associated number is x. We express 
this by saying that '‘the length of the object is x tenths of 
an inch”. We measure an interval on the tenth-inch scale in 
terms of a millimetre scale, and find that the ratio of the two 
associated numbers is y. This, by 6-3, is the same for all 
intervals, and in particular when the interval is the interval 
between consecutive divisions on the tenth-inch scale. Then 
the length of the object, in comparison with the milli- 
metre scale, is associated with the number xy^ and we express 
this in the language of quantity by saying that " the length of 
the object is xy millimetres”. That is, the statements "the 
length of the object is x tenths of an inch” and "the length 
of the object is xy millimetres ” are completely equivalent for 
all values of ;c. In any proposition containing the expression 
"tenths of an inch” we can therefore replace every tenth of 
an inch by "jy millimetres” without affecting the truth of the 
proposition. In this language, therefore, a tenth of an inch 
and y millimetres are completely equivalent ideas, and we 

can say ^ tenth of an inch = y millimetres. 

It is this proposition that provides the usual rule for con- 
version of units from one scale of measurement to another. 

Similar considerations apply to any fundamental magni- 
tude. In the case of mass the actual boxes of weights used 
introduce a slight complication. We do not in practice weigh, 
for instance, in terms of milligram weights alone. We use 
weights found by experiment to be equivalent, in regard to 
objects counterpoised by them, to various multiples of the unit. 
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The process is equivalent to measuring a length in terms of 
decimetres, centimetres, and millimetres and using the known 
standards of comparison of the various units to reduce the 
whole to millimetres. 

6-5, The majority of physical magnitudes are not measured 
directly. They occur as factors of proportionality in laws. 
Probably the only laws that do not involve such factors are 
those of simple constancy, and those expressing addition of 
measures of fundamental magnitudes in terms of the same 
scale. Nevertheless they may be connected with properties. 
For instance, liquids may be classified according to whether 
a given solid sinks or floats in them ; and this relation is un- 
affected by the size and shape of the containing vessel, so long 
as the solid does not actually touch the sides. Using different 
solids we can classify liquids in terms of each. This method 
establishes an order among liquids and solids. It is found 
that if we have made the classification in terms of one solid 
and then try another, the latter either sinks in all the liquids 
that the first sinks in, or floats in all those that the first floats 
in. There may be an intermediate group such that one solid 
floats in them but the other sinks. The liquids may therefore 
be arranged in an order such that each supports all solids 
supported by those before it in the series, but will not support 
some solids supported by liquids after it in the series. Then 
each liquid is said to have a greater density than those that 
precede it and a smaller one than those that follow it. We 
have abstracted from the empirical relation the property of 
density. The process resembles in outline that of abstracting 
the notion of length from the results of juxtaposition. But 
the analogy breaks down at the next stage. We cannot con- 
struct a scale of comparison for density by combining objects. 
In dealing with length we could put two millimetre intervals 
in succession and call the length of an object that fits the two 
together 2 millimetres; in dealing with time a process such 
that a seconds’ pendulum swings twice during it is said to 



PHYSICAL MAGNITUDES 


97 

occupy two seconds. In each case two of the standard in- 
tervals together are greater, in terms of the method of com- 
parison, than either separately. It is this fact that makes it 
possible to construct a scale. But in the case of density, when 
we put together two of the solids used for comparison, the 
combined solid does not determine a cut in the series of 
liquids outside those determined by the two solids separately ; 
in general the cut it gives lies between the two former ones. 
There is no way of constructing a scale based on a single 
solid; and the measurement of density as a fundamental 
magnitude breaks down. But we can weigh a portion of 
a liquid on a balance, and find its volume by means of a 
measuring glass. Both volume and mass are fundamental 
magnitudes, and when the process is carried out several 
times on different portions of the same liquid it is found that 
they are in fact proportional; therefore they are connected 
by a differential equation of the form 

dyldx^ylx. (i) 

This is a very simple equation, and its truth can be estab- 
lished with practical certainty by a very few trials. Its 
solutionis 

where A is what is known in pure mathematics as an arbitrary 
constant. What actually happens is that the integrated form is 
the first to be verified, and A is determined in the process of 
verification. But the actual observed values do not fit the 
form (2) exactly, but approximately. Nevertheless, since the 
law (2) is equivalent to the simple differential equation (i) we 
say that the equation represents a physical law, expressing 
the relation between volume and mass in portions of the 
same liquid. The arbitrariness in the solution is found to 
correspond to the differences between different liquids; all 
give the form (i), but in (2) the quantity A has different 
values for different liquids and therefore expresses a pro- 
perty of the liquid. We can, that is to say, arrange liquids 
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according to the values of A they give, and then give a name 
to It is then found that the order of increasing values of 
A is also the order specified by the results of flotation experi- 
ments. We then have a quantitatively determined value, the 
mass per unit volume, such that greater mass per unit volume 
among liquids corresponds completely to greater or less 
density. In this way we can attach a numerical value to 
density. 

Density is an example of a derived magnitude. It is a pro- 
perty capable of being ordered, but not directly measured 
in terms of a single scale. A series of experiments must be 
conducted, such that in each experiment two fundamental 
magnitudes are measured ; and the measures are found to be 
connected by a simple differential equation. This is then taken 
as the physical law. An adjustable constant emerges in the 
solution, and we call this constant the density. In general 
it appears that derived magnitudes are the adjustable constants 
that arise in the solution of the differential equations of physics. 
In the simple case of the comparison of two scales of measure- 
ment we have already introduced a derived magnitude, by 
saying that the length of an object is 2*54 mm. for every 
tenth of an inch. Here we have begun by establishing a rule 
of proportionality valid for any two scales, and have found 
the number 2-54 as the actual one applicable to the particular 
pair of scales chosen. But its character is less evident than in 
the case of density because the properties it enables us to 
compare are merely two different ways of specifying the same 
thing, the length of a given object. In the case of density the 
derived magnitude provides a means of connecting two quite 
distinct properties of a portion of the liquid, namely its 
volume and its mass. 

In some sense a derived magnitude is measured in terms 
of a number, but its structure is more complicated than that 
of a fundamental magnitude. The units used in determining 
the various fundamental magnitudes involved are obviously 
reflected in the number that appears in the measure of the 
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derived magnitude. The number attached to a density as a 
mass per unit volume will depend on whether the mass is 
measured in grams or pounds, and the volume in cubic 
centimetres or cubic inches. When we say that a density is 
I ‘34 grams per cubic centimetre, the expression 1*34 grams 
per cubic centimetre*’ is a complete entity; no item in it, 
neither ‘‘1-34’*, nor “grams**, nor “cubic centimetre**, can 
be changed without altering the meaning of the whole. For 
this reason it is incorrect to speak, as is done in many writings 
on the theory of dimensions, of a “mere change of units**. 
There is no such thing as a mere change of units. If we alter 
a unit without altering the number in the measure, we are 
speaking of a different physical system, and cannot assert 
anything about it without a physical law to guide us ; while 
if we already know the law a change of units tells us nothing 
that we cannot find out by keeping the same units and 
altering the numerical measure*. 

In discussing length we began with length as a property of 
an object and led up to the idea of length as a quantity. 
Between the two a one-one correspondence exists. If two 
objects are different in the property, as tested by direct 
juxtaposition, they have different measures, and conversely. 
Similarly for density, we may regard it as a property, differ- 
ences in which are tested by the method of flotation, or as a 
mass per unit volume, the mass and the volume being both 
measured as fundamental magnitudes. A one-one corre- 
spondence exists between the property and the measure. If 
we are to proceed to consider density as a quantity we must 
verify that its measure satisfies our fundamental law for quan- 
tities. To do this, consider a given portion of a substance, and 
measure both its mass and its volume in terms of two different 
scales. Suppose the numbers associated with the mass on 
the two scales to be m and m\ and those associated with the 
volume V and Then m'lm is the measure of the interval 

* For this reason the so-called ** method of dimensions ” is fallacious. It 
should be replaced by that of similarity, as Campbell has explained (/oc. cit.). 
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of the first mass-scale in terms of the second, since mass is a 
fundamental magnitude; and v'jv is /, the measure of the 
interval of the first volume-scale in terms of the second, since 
volume is a fundamental magnitude. Hence 

v' f v‘ 

But mlv and m'jv' are the numbers associated with the den- 
sities on the two pairs of scales. If the density of a substance 
on the first pair of scales is associated with the number unity, 
then on the second pair it is associated with the number /t//. 
Our equation enables us to extend this by saying that if the 
numbers associated with the density on the two pairs of 
scales are p and p', then p'/P) values of p, is the number 
associated on the second pair of scales with the density of a 
substance associated on the first pair of scales with the number 
unity. This shows that density actually does satisfy the rule 
required. 

We saw that in any proposition about length we could re- 
place a tenth of an inch by 2*54 mm. without affecting its 
truth or falsehood. Thus tenths of an inch’^ and 
millimetres ” express the same length, whatever x may be. 
Now when the scales are specified “p units of mass per unit 
of volume’’ expresses a definite density. Consider then a 
portion of the substance with a volume expressed by v and a 
mass expressed by pv on the first pair of scales. We can say 
that its volume is v of the first volume-units, and its mass pv 
of the first mass-units, using now the language of quantity. 
Also its volume is fv of the second volume-units, and its 
mass is pipv of the second mass-units. We therefore have, for 
all values of v, the result that "'pv of the first mass-units per 
V of the first volume-units ” expresses the density of the same 
portion of substance as ** p^pv of the second mass-units per fv 
of the second volume-units”. But since mass and volume in 
the same substance are proportional this implies that a density 
expressed hy*"p of the first mass-units per first volume-unit” 
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is the same as one expressed by ‘Vp// the second mass- 
units per second volume-unit**. We can therefore replace any 
density p in terms of the first pair of scales by p^p/f in terms of 
the second pair. We can also, if we like, regard density now 
as the ratio of a mass to a volume. For if we choose to in- 
troduce the concept of the ratio of two quantities neither of 
which is a number, we can write the following equations : 

V first volume-units =fv second volume-units, 
pv first mass-units = ppv second mass-units. 

Hence by division 

pv first mass-units _ pupv second mass-units 

V first volume-units fv second volume-units ‘ 

But the constancy of the ratio of the numerical measures of 
mass and volume in the same substance entitles us to cancel 
the factor v in both ratios. Also if we call ‘‘one mass-unit per 
unit of volume** a “unit of density**, we have 

p first density-units = ppjf second density-units, 

which gives the correct law of conversion from the first pair 
of scales to the second. The notion of the ratio of two quan- 
tities of different kinds, though it resembles that of quantity 
itself in having no logical reason for its existence, can actually 
be shown to lead to correct answers, and is therefore justifi- 
able on the ground of convenience. Every proposition con- 
taining it can if desired be reinterpreted in terms of more 
fundamental ideas and then verified. 

Derived magnitudes, in comparison with fundamental ones, 
are less immediately related to sensation, but more general in 
application. Thus a specimen of a given liquid may have any 
mass or volume, but each of these fundamental magnitudes 
is directly determinable in terms of a standard and a definite 
method of comparison. Their ratio, however, is always the 
same (with precautions, if necessary, about keeping the tem- 
perature and pressure constant). The density is not directly 
measured, but remains the same for the same liquid however 
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we may vary the volume. Its very existence depends on the 
truth of the physical law that the ratio of the measures of the 
mass and the volume is constant, and therefore on the truth 
of the differential equation 


dv 



= o, 


which holds for any liquid. The constant p that occurs in the 
solution of this equation, namely 


m = pVy 

exists in consequence of the differential equation, which con- 
tains no quantity not measured directly ; p is independent of 
the volume and therefore is more general in its application 
than the original data, but there is nothing in the work to 
indicate that it should be the same for different liquids, and 
it is in fact found to be different for different liquids. 


6*6. Let us return to the question of the solid of revolution 
rolling down an inclined plane. It was found that the dis- 
placement was proportional to the square of the time, satis- 
fying the equation ^ ^ o• 20 /^ (i) 

where x is measured in centimetres and t in seconds. The 
coefficient o-2 is a ratio found by experiment to fit a number of 
observations and therefore represents a derived magnitude. 
In consequence of our earlier discussion of the probability of 
physical laws we cannot admit that a numerical constant in 
a law, in its ultimate form, is capable of continuous variation. 
But we can remove the constant by writing the law in any of 
the differential forms 


d^x 

dt^ 


d /x\ 
’’ dt\tV 


dx 


2X 

7 ’ 


( 2 ) 


of which the last two are equivalent. The first has the general 

solution , . , \ 

x^ a + ut-{- ( 3 ) 
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where a, u, and / are adjustable constants. The second and 
third have as their most general solution 

^ = ( 4 ) 


simply. In either case the constant J/ appears, and can be 
identified with the 0*20 of the actual experiment. But a and 
u are on a somewhat different footing. They resemble / in 
being constants of integration. But they are much more 
sensitive to the given experimental conditions. The form (4) 
is applicable only if x is measured from the initial position 
of rest and t from the time when the body is released. If the 
body is originally some way down the scale and moving, or 
if the stop-watch is not originally at zero, the form (4) is 
experimentally untrue, but (3) still holds with suitable values 
of a and u. But / is much more general in its application. 
Wherever the body is started we get the same value of/; a 
and u are more general than x and which vary from one 
single observation to another, but they vary from one experi- 
ment (series of observations) to another. 

Actually we may repeat the experiment with a different in- 
clination of the plane to the horizontal. It is then found that 
/ itself is different, and is proportional to the sine of the in- 
clination a. If we call this sine a our law takes the simple form 


dcr a* 


(5) 


But we may proceed to experiment with different solids of 
revolution, and we find that different solids give different 
values of / for the same inclination ; in fact / is proportional 
to c^l{c^ + where c is the distance from the axis of the 
body to the line of contact and k the radius of gyration about 
the axis. Both c and k can be found by measurement. The 
result of the several series of experiments is that the quantity 

I d^x _ 

sin a dt^ ^ 


( 6 ) 
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is constant for variations of the initial position and velocity, 
of the inclination, and of the form of the section of the body 
by a plane through the axis. It is therefore a quantity of 
much greater generality than the actual acceleration in any 
one experiment, and its existence really sums up at once not 
one but several differential equations. This, then, is the 
ultimate form of the law of the rolling of a solid of revolution, 
and the constant g in it is the most general derived magnitude 
obtainable from such experiments. It can be shown by other 
experiments to be the acceleration of a falling body and to be 
also a derived magnitude associated with the simple pendu- 
lum. These agreements are predictable from the laws of 
dynamics and constitute a verification of these laws. 

The quantities a and u are really derived magnitudes be- 
cause they still arise in the solution of the general equation (6) 
and have to be determined in each experiment so as to make 
the formal solution fit the observed values of x and t as closely 
as possible. But in each experiment they are at our disposal ; 
when one experiment is finished the values of a and u asso- 
ciated with it have no further application. Consequently 
they are not given in books of physical tables ; but g is always 
given. There are cases, however, where a and u or their 
analogues have a wide application. We do record the position 
of a star and its proper motion at the date 1900.0. The reasons 
are first, that these quantities are useful in finding the posi- 
tion of the star at any other date, years or centuries earlier or 
later; and second, that we cannot put the star back and start 
it off differently, so that the quantities are not at our disposal. 
In fact so far as experiments go the difference in character 
between a and u,/, and g is one of degree of generality; there 
is no fundamental qualitative difference. 

6 - 7 . In all numerical work we make free use of mathematics ; 
and the numbers that arise go far beyond the simple class- 
numbers that we start with. We have already seen, in con- 
sidering the properties of the isosceles right-angled triangle. 
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that we cannot even restrict the values of our quantities to 
rational multiples of the unit without sacrificing the truth of 
the laws of physics. The adopted values of the lengths of the 
sides, if obtained by making repeated measures to the nearest 
multiple of the scale-interval and taking the mean, will always 
be rational. We can preserve the exactness of the law only by 
admitting the existence of errors in the adopted values them- 
selves; and that implies the existence of an unknown true 
value behind the adopted value. This is the justification of 
the assumption we made in discussing errors of measurement, 
that there is a true value, which our observations may enable 
us to identify within limits, but never exactly. Further, the 
whole series of rational numbers is insufficient to specify all 
the possible true values ; and there seems to be no reason for 
not admitting the whole series of real numbers. 

This brings us to an attitude towards real numbers that 
seems to agree better with that of ordinary mathematicians 
than with that of Whitehead and Russell. The usual procedure 
in defining for instance, would be to divide the rational 
fractions into two classes, such that the squares of those in 
one class were all less than 2, and of those in the other class 
greater than 2. The number separating the two classes is then 
called y/z. The principle is known as Dedekind's section. 
Whitehead and Russell, however, point out that there is no 
a priori reason to believe that there is any number that is 
greater than all numbers of the first class and less than all of 
the second class; Dedekind’s section assumes an existence 
without proof. Whitehead and Russell proceed by defining 
-v/a as the class of all rational fractions whose squares are less 
than 2; this class exists in the same sense as other classes. 
But addition and multiplication of classes have to be redefined 
for non-rational numbers so as to keep the ordinary laws of 
algebra true. Thus 'y/z + V3 b^is to be defined as the class of 
the sums of all pairs of rational fractions such that one has a 
square less than 2 and the other a square less than 3, and 
^/z X Vs ^ the class of the products of pairs of rational 
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fractions satisfying the same conditions. In this way they 
are able to establish the existence theorem for real numbers 
and develop their algebra without assumption. But from the 
physical point of view there is no apparent reason to believe 
that the numbers that occur in true values of variable 
quantities are really classes of rational fractions, while there 
is direct reason to believe that these numbers do exist and 
are different from rational fractions. From our point of view 
Dedekind^s assumption is therefore open to less objection 
than that of Whitehead and Russell. The utility of the latter 
is that it does establish what might otherwise be open to a 
certain amount of doubt, that there is no internal inconsistency 
in assuming the existence of non-rational numbers and 
applying to them the ordinary rules of algebra established 
for rational numbers. 
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MENSURATION 

’Tis distance lends enchantment to the view. 

Thomas Campbell, Pleasures of Hope 

7*1. We are now in a position to begin the discussion of the 
most fundamental physical science, that of the relations 
between lengths. We shall call it mensuration. It requires to 
be distinguished at the outset from the subject known to pure 
mathematicians as geometry. The latter is a branch of pure 
logic. It proceeds by taking a number of general axioms, ir- 
respective of whether these are physically tested or capable 
of being tested, and develops their consequences by purely 
logical rules. Physical measurement plays no part in it. For 
us, physical measurement is the whole raison d'etre of the 
subject. By comparing our measurements we establish cer- 
tain laws ; these lead to generalizations, which in many cases 
resemble the axioms of forms of geometry. But the structure 
is essentially different. In geometry the laws are assumed a 
priori^ and the particular results are consequences of the laws. 
In mensuration the particular results are the essence of the 
matter, and the general laws are derived from them by a 
process of generalization based on the simplicity postulate. 

It might nevertheless appear that, in spite of the opposite 
modes of approach, the total content of mensuration and 
geometry might be the same, the axioms of geometry being 
the same as the laws of mensuration. But this is not the 
case. 

All projective and descriptive geometries can evidently be 
ruled out at once. A requirement of all such geometries is 
that no notion analogous to distance is to be used. Since 
distance forms our subject-matter, there is no common ground 
whatever. 
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Euclid’s geometry is the closest existing analogue of men- 
suration, and requires a full discussion. The notion of length 
is freely used in it. His points are sufficiently like what we 
have so far called the ends of objects, and we can produce 
close enough physical analogues of his straight lines, planes, 
and circles. He freely uses the principle of juxtaposition as 
a test of whether one quantity is greater or less than another ; 
here he follows the ordinary physical method of comparing 
lengths. 

Nevertheless his system differs from any possible system 
of mensuration ; in fact it is neither a mensuration nor a geo- 
metry, but a mixture of the two. For instance, he uses com- 
passes to draw circles, a legitimate physical procedure, but 
refuses to use them to transfer distances. In I (2), when he 
wishes to draw from a given point a line whose length is equal 
to that of a given straight line not through the point, he makes 
a complicated construction to avoid having to lift up the com- 
passes and transport them. Yet in I (4), in testing the equality 
of two triangles, he supposes one picked up bodily and super- 
posed on the other. The ordinary properties of rigid bodies 
are supposed to be possessed by triangles (drawn on pieces of 
paper) but not by a pair of compasses. The usual criticism 
from the geometrical standpoint is to reject the proof of I (4) 
and provide a new one; from the physical one the proof of 
I (4) is valid in certain conditions, though the result is true 
even when the construction involved cannot be carried out, 
but the complication of I (2) avoids only a difficulty that does 
not exist. 

Euclid postulates further that any two points can be joined 
by a straight line. The physical analogue of this is often true, 
but not always. The points may be on the surface of a convex 
body too hard to be bored. Yet the distance between the 
points exists, for it can be measured by applying compasses 
first to the two points and then to a scale. Physically the 
notion of the distance between two points is more general 
than that of the straight line joining them. 
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The most important departure of Euclid’s treatment from 
any possible account of mensuration, however, is in the dis- 
cussion of parallels and the related propositions. We may refer 
to the second postulate, that a straight line can be produced 
to any length, however great, and to the fifth postulate, also 
called the twelfth or parallel axiom. Both of these postulates 
have been criticized by modern geometers as not obvious. 
In mensuration, on the other hand, they are not only not 
obvious but demonstrably false. We cannot produce a phy- 
sical straight line to a length greater than one determined by 
the size of the body it is drawn on; it may be extended by 
fastening other bodies on, but there is a limit to this process, 
and therefore to the length of the line. Again, it may be 
possible to find out by our existing methods of measuring 
angles that, when one straight edge crosses two others, it 
maJkes the sum of the interior angles less than two right 
angles, but it does not happen in all such cases that the two 
straight edges it crosses intersect; for in practice they often 
cannot be made long enough, or they may not be in one plane 
— a detail not allowed for in the usual statement of the 
postulate. 

The alternative known as Playfair’s axiom does not meet 
the difficulty, for it is not true that of two intersecting straight 
edges at least one must intersect any other ; Playfair’s parallel 
axiom fails in just the same way as Euclid’s. 

Criticisms of Euclid’s Elements have usually been made 
from the geometrical standpoint and not from the physical 
one, and its virtues from the one are usually its vices from the 
other. His test of equality is always superposition. Two 
lengths are equal if one can be superposed on the other. The 
same applies to two angles. Two areas are equal if one can 
be cut up and the pieces placed so that they exactly cover the 
other. These are physical methods of comparison ; and just 
for that reason they are rejected by geometry. His procedure 
is such that numerical measures do not arise; addition and 
subtraction of quantities are done on the actual objects them- 
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selves. He thereby sacrifices the convenience of being able 
to resort to algebra; but he also avoids a trap. Euclid would 
never have said that a length of 1*5 cm. is converted into one 
of 1-5 in, by a “mere change of units”; nor would he have 
said that the mass of the sun is 1*5 kilometres. 

The word “geometry” literally means the measurement of 
the earth, and Euclid’s predecessors were doubtless largely 
inspired by the needs of surveying. By this time, how- 
ever, the name has become so closely connected with the 
branch of pure mathematics that it seems hopeless to rescue 
it. Nor is it, I think, worth while. The measurement of the 
earth is now generally known as “geodesy”, and what we 
need is a word to describe the theory of measurement of 
length in general, not merely in relation to the earth. “Men- 
suration ” seems entirely satisfactory, saying neither more nor 
less than it actually means. 

It is a fact that when Euclid’s theory gives a quantitative 
result, and the relevant construction can be carried out, the 
result is always found to be physically correct*. Nevertheless 
his axioms assume so many things possible that are in fact 
physically impossible that a radical reconstruction is needed. 
The modern physicist will not share his antipathy to numerical 
measurement, and will recognize in his treatment of angles 
and areas a perception that these, like length, are fundamental 
magnitudes. If he accepts the notion of quantity he will not 
refuse to say that a square centimetre is literally the square 
of a centimetre ; but it is not strictly necessary to say so. The 
question that does actively arise at the outset, however, is 
whether we should introduce from the start any fundamental 
magnitudes besides length. Euclid assumes in I (13) that if a 
pencil of coplanar lines is drawn through a point, the angle 
between the extreme lines is equal to the sum of those be- 
tween consecutive lines of the pencil ; and in I (4) he supposes 
that angles that can be superposed are equal. These postulates 

• Except in the extreme case of the displacement of star images by 
the sun*s gravitational field. 
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make it possible to construct a scale for measuring angles in 
terms of a unit. Such a scale we may at once call a protractor, 
and angle is a fundamental magnitude. Again, in I ( 35 ) he 
compares the areas of parallelograms by cutting them up and 
superposing them, and his later work with triangles and rect- 
angles indicates that area also is a fundamental magnitude. 
There are therefore three different fundamental magnitudes 
in the theory, and in the development they continually in- 
fluence one another. All can be shown to exist in the sense 
that their measurement can actually be carried out, and there 
is no theoretical objection to developing the theory of all 
together. But there is a practical objection. Angles and areas 
can actually be superposed only in special cases ; projections 
on the bodies that carry them usually interfere with the super- 
position. Again, the addition of angles or areas is meaningless 
unless they are placed in the same plane; thus the direct 
measurement of either depends on the existence of physical 
planes, whereas the measurement of distance by means of 
compasses and scale is independent of the existence of planes. 
Since distance is much more generally measurable directly 
than either angle or area it is desirable to develop the theory, 
if possible, on the basis of the properties of distance alone. 

7*2. Mensuration deals essentially with the relations between 
measurements of distance on rigid bodies. It may be sug- 
gested that before it can be discussed we should define the 
terms “ distance ” and ‘‘rigid bodies Now the requirement 
of a definition is that it must make it possible to recognize 
the defined object when it actually occurs. It is of no value 
to say that a rigid body is one such that the distances between 
all the points of it are unaltered by any displacement, nor to 
define relative motion as change of distance between parts 
of a system, unless we have some way of recognizing when 
distances are altered. Distance, again, cannot be defined in 
terms of the properties of rigid bodies unless we have first 
some way of recognizing the rigid body when we meet it. 
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None of these notions can be defined in terms of the pro- 
perties of “space’’, because we have no means of recognizing 
space directly; distance in space, for instance, caimot be 
determined except through measurements, which at once re- 
introduce material scales, which the reference to space was 
intended to avoid. 

The solution seems to be that neither “rigid body” nor 
“distance ” is directly recognizable, and that both are derived 
from still more elementary notions, several experimental facts 
being used in the process. Let us start from the notion of a 
body^ without considering how we arrive at this concept. It 
is a fact that we can make permanent marks on bodies, which 
we can recognize afterwards. It may be found that if we have 
two marks Ay B on one body and two others, C, D on another, 
we can place the bodies so that A coincides with C and B 
with D. If a pair of compasses or calipers is adjusted so that 
one point coincides with A and the other with By then it can 
be transported without readjustment and placed so that one 
point coincides with C and the other with D. All pairs of 
marks that can be fitted by the compasses in the same adjust- 
ment are classified together; we abstract the common pro- 
perty of distancey and say that all such pairs are equidistant. 
It may happen that two equidistant pairs can be superposed 
directly; but this is not always possible, because material 
obstructions may interfere. It is clear, in particular, that 
different pairs of marks on the same body cannot be super- 
posed without deforming the body, even if they are equi- 
distant. Now when a fit of pairs of marks has been obtained, 
either directly or through the use of compasses, it may be 
found that a fit is always obtained again in any subsequent 
trial. If this holds for numerous pairs of marks on the same 
body, we can generalize it as a law for that body. Such a 
body is called rigid. Compasses are rigid bodies provided 
their adjustment is not altered. If there is a doubt as to 
whether the adjustment has altered they can be tested by 
application to several pairs of marks that they previously 
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fitted; and if they fail we can tighten up the hinge or get a 
new pair. In the first place distance is simply a property of 
pairs of marks on rigid bodies. 

It is also a fact that bodies can be made with edges ; if two 
bodies touch at two points they may touch at a continuous 
set of intermediate points. In general, when this is done, if 
we turn one or both of the bodies about so that they remain 
in contact at two given pairs of marks, the intermediate marks 
that were formerly in contact separate. But it is again a fact 
that bodies can be made with such edges that they do remain 
in contact at intermediate marks when they are turned about 
two coincident marks. When this has been found to hold in 
a number of trials it can be inferred with a high degree of 
probability that it will hold in any subsequent trial. In such 
cases we call the edges straight. 

The reservation must be made that the bodies, in both types 
of test, must receive only ordinary treatment. It is easy to 
recognize by the sensations that we call sensations of force 
when exceptional treatment is taking place. If bodies or 
edges fail to satisfy our tests we say that they are not rigid or 
not straight, or that exceptional treatment has taken place. 
In that case they do not form part of our present subject- 
matter. The important thing is that there are many bodies 
that do satisfy the conditions. If all compasses were made of 
rubber and all bodies of plasticene, this would not be so, and 
then perhaps there would be no science of mensuration ; but 
actually we can classify bodies and edges according as they 
do or do not satisfy our tests, and confine our attention for 
the present to those that do. The others are reserved for the 
subjects of mechanics. 

So far we have been able to define only identity of distance 
and not the meaning of greater and less in relation to distance. 
We need also to be able to establish a meaning for the state- 
ment that the distance AB between one pair of marks is 
greater than the distance CD between another pair. To do 
this it is necessary to be able to establish two other pairs of 
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marks A' and B\ C and Z)', such that by our criterion the 
distance AB is the same as the distance A'B\ and the distance 
CD the same as the distance C'D', and so that A' B' and C'D' 
are directly comparable. One method of comparison would 
be to say that A'B' is greater than C'D' if the compasses have 
to be set to a wider adjustment to fit the former. Another, 
which is more closely related to actual measurement, is to use 
the straight edge. If A and D' are on a straight edge, there 
is a definite path from A' to B' along the edge. If the marks 
C' and D' are chosen on the same edge so that one lies 
between A' and J5' and the other either also lies between them 
or coincides with one of them, then the part of the edge 
AB' includes the whole of the part from C' to D' with some- 
thing over, and we say that the distance A B' is greater than 
the distance C'D', and therefore that the distance AB is 
greater than the distance CD. The introduction of the straight 
edge in defining the meaning of greater than in relation to 
distance seems to be necessary because, while unequal dis- 
tances cannot in any case be superposed by moving the 
respective solids about, there are also cases where the dis- 
tances are really equal but cannot be superposed on account 
of the form of the solids, and we must provide ourselves with 
a means of distinguishing between real difference and mere 
failure to carry out a strict comparison. 

We can now proceed to the construction of a measuring 
scale on a straight edge and to the actual measurement of 
distance by the principles of the previous chapter. 

7*3. So long as marks lie on the same straight edge, the dis- 
tances between them follow the simple rules of addition and 
subtraction. But we also require propositions connecting 
distances between marks not on the same straight edge. This 
brings us into a new domain, and at least one new experi- 
mental fact is needed to serve as a starting-point. As has 
already been indicated, propositions involving angles or 
planes should be avoided as far as possible until we can define 
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them in terms of lengths. Our physical treatment must begin 
with an experimental law connecting distances between marks 
not on the same straight edge. 


7*31. Consider any three marks, O, Y whose mutual 
distances are known, and consider the ratio 


OX^ + OF2 -- XY^ 
2 OX.OY 


(0 


If O, Xy and Y lie on the same straight edge, and O is not 
between X and F, then 

ZF=0^-0F, and A - i. (2) 

If O, Xy and F lie on the same straight edge, and O is 
between X and F, then 

XY ^OX-\-OYy and A - - x. (3) 

If O, Xy and F are not on the same straight edge, A in 
general is found by experiment to lie between ± i. 

But if X and F lie on two rigidly fixed straight edges 
meeting in O, then wherever X and F may be taken on these 
edges A has a constant value; that is, A is independent of 
both OX and OF. 

This proposition lacks the chief requirement of a postulate 
in a geometry, namely that of possessing a ndiveti that disarms 
suspicion. But for our purpose what matters is that there 
should be a practical way of ascertaining whether it is true ; 
and it is capable of test in almost all cases, and such test has 
already been carried out in countless experiments in practical 
plane ‘‘geometry”. It has perhaps not been tested directly 
with the full accuracy of modern measuring apparatus, but 
enough has been done to establish it in an enormous number 
of cases. It is not extremely simple in form, but the number 
of verifications is so great that if it has any appreciable prior 
probability the probability of all inferences from it must 
amount to practical certainty. We therefore suppose it to 
hold in general and attempt to develop its consequences. 
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7 ‘ 32 , Instead of using the ratio A itself, it is convenient to 
work with a certain function of it. We put 

A = cos a, (4) 

where the cosine is defined as in works on analysis. This 
defines a value of a less than tt. Also since A is independent 
of the actual values of OX and O T, its value expresses a pro- 
perty of the pair of edges OX and OT as wholes and not of 
any particular marks on them. The same applies therefore to 
a. We denote a usually by Z XOY and call it the angle 
between OX and OY. Then (i) is equivalent to 

XY^ = OX^ + OY^ - zOX.OYcos XOY, (5) 
This is practically Euclid ii, 12 and 13. 

7 - 33 . It may happen that A = db i when the three marks do 
not lie on a straight edge. In any case when A = ± i we 
call the marks collinear. If A = — i we say that O is between 
X and Y, and Z XOY = tt; if A = 4* i, we say that O is not 
between X and Y, and Z XO Y = o. It is also found that 
if O, JSTi, X2 are collinear, and O, Yj , Y^ are collinear, 

OX^^ + O Yi^ - X,Y,^ _ 0X2^ + O Yg^ - X^Y^^ 
2OX1.OY1 2OX2.OY2 

irrespective of whether the collinearities are given by actual 
straight edges. Also there is only one possible position Y 
collinear with O and Jf, such that OY has a given value and 
O is not between X and Y, Conversely in experiments on 
a laboratory scale, if O and X are already assigned, we can 
place Y so that OY has any given value. We can also 
generalize to collinear sets of points the principle corre- 
sponding to Euclid’s postulate that two lines cannot enclose 
a space, namely that as Y proceeds from O to X and beyond 
it there is only one possible path such that the points O, Xy 
and Y are always collinear and along this OY increases 
continuously. 
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We can now proceed to develop the theory*. 

7"41. \i A, B, C are three marks as in Fig. i, we have 


cos BAC = 


AB^ + AC^ - BC^ 


zAB.AC 

Write BC = a, CA = b, AB = c, and zs = a + b + c. 
Then 

b^ + — a} 


cos BAC 


zbc 


sin iBAC - _ (r- (*^‘)‘ + ' L’)* 

be 


cos 


\BAC^ ^L+_cosB^^i^ ^ (6 + cf - - 


be 


tan \BAC = ® 

^ COS \BAC \ s {s — a) 

Corresponding formulae for the other angles are obtained 
symmetrically. 


A 


D 


..i 

G 


Fig. I 


A 


0 

Fig. 2 


7-42. By the formula for the tangent of the sum of two angles, 

^ m Ary r,r, /i\ t^n \BAC + tan iBCA 
tan i (BAC + ta^ BCl 

j(s-b) ) i 
(s - c) (s - a)j 

Hence ABAC + A ABC + ABC A = tt. Euc. i, 32 . 

* In the figures continuous lines denote actual straight edges ; dotted 
lines connect only marks the distances between which are considered, 
but which need not be connected by actual straight edges . 


tan (Itt — \ABC)» 
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7 . 43 . If Bi, O, B2 are collinear marks (Fig. 2), and A is 
another mark, 

ABi^ = OA^ + 05,2 _ 2OA . 05 , cos ^ 05 , , (i) 

ABi^ = OAo + 05^2 - 2OA . 052 cos AOB^ , (2) 

and also = ABj^ + B^B^ — zAB^.B^B^ cos AB^B^. (3) 
But 5,52 = 5,0 + 052 , (4) 


cos AB1B2 = cos AB^O 


A^ + 05 ,2 _ ^2 
2 ^ 5 , . 05 , 

05 , — OA cos AOBi 
~AB^ 


(5) 


by (i). Substituting from (4) and (5) in (3) we have 
AB2^ = . 45,2 + BiB2^ - 25,52 ( 05 , - O^ cos ^ 05 ,) 

= 0^2 + 05,2 + BiB2^ - 25 , 52 . 05 , 

+ 20^ (5,52 — 05 ,) cos AOBi 
= 0^2 0522 + 20^ . OB2 cos J 05 i , (6) 


whence, comparing (2) and (6), 

cos AOBi + cos AOB2 — o, (7) 

and therefore ^^ 05 , + LAOB2 = tt. (8) Euc. i, 13 


7 ‘ 44 . It follows as an immediate corollary by Euclid’s method 
that when two straight edges cross the opposite angles are 
equal. Euc. i, 15. 

7 * 45 . It also follows from 7-42 that 

AJB1O -f" Z- SjOA -f" Z^OAB-^ = TT, (i) 

LOB^ + LB^AO + /LAOB^ =77, (2) 

Z. B^B^A + Z B^AB^ + Z AB^B^ = tt. (3) 

Adding (i) and (2) and subtracting (3), and cancelling iden- 
tical angles, 

Z^ BiOA “f- Z. AOB2 4 “ Z. OAB^ -f" Z.B2AO 

— Z.B2AB1 = 77. (4) 

But by 7-43 z 5,0.4 + ^AOB2 = 77. (5) 

Hence /LOAB1+ LB2AO = LB2AB1. (6) 
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Thus angles so placed that they have one common arm and 
so that collinear points exist on the arms, and measured by 
the rule 7*32 (4), have the additive property. 

7 - 46 . If two straight edges OA and OB meet in O, and if 
cos AOB is negative, and if O is not the end of OB, it is 
possible to make a mark on OB so that cos AOB^ is 
positive. 

For by 7*43 we need only take B^ on the side of O opposite 
to B, and the result follows. 


O 


A - 
r I 

I 

t 

I 

I 

t 

JL 


747 . If A be outside the edge OB, and if B be on that 
part of it where cos AOB is 
positive, and if OB is greater 
than OA cos AOB, then it is 
possible to make a mark C on 
OB such that 

OA^ = OC2 + AC\ 

For we can make a mark C 

on OB at a distance OA cos AOB from O. Then 

AC^ = OA^ + OC2 - 2OA . OC cos AOB 

^ ^ .. ^OA^-OC\ (i) 

which proves the proposition. 

Also _ . . OC^ + CA^ - OA^ 


C 

Fig. 3 


B 


cos OCA 


= 0, 


2OC.CA 

whence ^ Z. OCA = Jtt. (3) 

We have therefore constructed a triangle with one angle 
equal to \tt. We can now introduce the definition of per- 
pendicularity. If the angle between two intersecting straight 
edges is ^tt, they are said to be perpendicular, Euclid i, 47 
follows immediately from the formula 7*32 (5). 

If OB is a straight edge with a mark C on it such that 
Z OCA is \tt, where ZI is a mark not on OB, then C is called 
the foot of the perpendicular from A to OB. 


7 * 48 . If two straight edges OA, OB intersect at O (Fig. 4), 
and C is any mark in OA, and if the length of OB exceeds 
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OC sec AOBy then we can make a mark D on OB so that 
C is the foot of the perpendicular from D to OA, 

For we can make a mark D so that OD = OC sec AOBy 
and the perpendicularity follows as in 7*47. 


O C A 

Fig. 4 


B 

.'I 




7 * 49 . If the angle AOB (Fig. c) is Itt, we have 
OA^+AB^-OB^ OA 

MATAS AS- (■) 

sin OAB = cos (^tt ~ OAB) = cos OBA by 7*42 


tan OAB 


AB^ 

sin AOB __ OB 
cos AOB OA* 


with corresponding formulae for the other trigonometric 
functions. These results thus emerge as laws, and not as 
definitions of the functions as in ordinary trigonometry. 


7 * 50 . Consider three edges meeting in a point O (Fig. 6). It is 
always possible to fix A in one of them so that A is the foot 
of the perpendiculars from marks B and C on the other two, 
since the condition of 7-47 can always be satisfied by making 


OA short enough. Then 

AB = OA tan AOBy (i) 

OB = OA sec AOBy (2) 

AC = OA tan AOCy (3) 

OC = OA sec AOCy (4) 


BC^ = OB^ + OC2 - zOB.OC cos BOC 
- OA^ (sec2 AOB + sec^ AOC 

— 2 sec AOB sec AOC cos BOC). (5) 
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Also 

BC^ = AB^ 4- AC^ ^ zAB.AC cos BAC 
= OA^ (tan^ AOB -|- tan^ AOC 

— 2 tan AOB tan AOC cos BAC), (6) 

Equating (5) and (6), and multiplying by cos AOB cos AOC, 
we have 


cos BOC = cos AOB cos AOC 

+ sin AOB sin AOC cos BAC, (7) 



This theorem introduces the third dimension for the first 
time, for it allows two different lines AB, AC to be both 
perpendicular to an edge OA at the same point. The formula 
(7) is the analogue of a familiar one in spherical trigonometry, 
though the sphere as such has not yet appeared. 

It follows as«a corollary that /LB AC is independent 
of OA, 

li B, A, C are collinear, cos BAC = — i, and (7) leads to 
LBOC - /LAOS + CAOC, 

This is equivalent to 7*45 when actual straight edges connect 
the marks. 

7 ' 51 . We can now proceed to a discussion of planes. If 
we take two fixed marks O and O', and any path from 
O to O', then at one end of the path the distance from 
O is greater than that from O', and at the other end the 
opposite is true. Both distances vary continuously, and 
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therefore there is a possible position on the path such that 
the distances from O and O' are o 

equal. Marks equidistant from two 
fixed marks are said to be on the plane 
determined by the fixed marks. 




p X Q\ 


o' 


7*62. If two marks P and Q are on ^ 

a plane, every mark R collinear with 
them is on the plane. For since P 
and Q are on the plane we can write Fig. 7 

PO==PO'=p; QO=^QO'^q; PQ == r, (i) 

/)2 4. ^2 _ ^2 

COS OPQ = cos OTQ = - ^ = h say. (2) 


Then RO^ = PO^ + PR^ - 2 PO . PR cos OPQ 
= /)2 + — zp.PR.k 

= by symmetry. ( 3 ) 


7'53. If two planes are determined by pairs of marks O and 
O', H and H\ three circumstances may arise. All positions 
P in the first plane may be equidistant from H and H ' ; 
then the planes are identical. All positions P in the first 
plane may be such that PH>PH\ or all such that PH< PH' : 
then the planes have no common point. Some positions P in 
the first plane may be such that PH > PH'y and others such 
that PH < PH', Then we may classify the positions on the 
first plane according to the sign of PH — PH ' ; on any path 
from a position where this is positive to one where it is nega- 
tive, the difference varies continuously and therefore passes 
through the value zero. Thus it is possible to assign marks 
common to both planes ; they are said to be on the line of 
intersection of the planes. 

All marks on the line of intersection of two planes are 
collinear. For if Q and R are on this line, every mark 
collinear with Q and R is common to both planes, by 7 * 52 ; 
hence the marks collinear with Q and R constitute the line 
of intersection. Euc. xi, 3 . 
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7 * 64 . Any three marks A, J 5 , C lie on a plane. For if we take 
a point B' collinear with BA so that B'A = AB, and one C' 
on CA so that CA = ACy then JS' and B determine one 
plane and C and C another. Clearly A lies on both planes. 
If O is another common point (Fig. 8) we have 
OS' = OS; OA = OA ; AB^ = ABy 
and therefore Z. B*AO == Z. OAB = ^tt. 

Thus AO is perpendicular to ABy and similarly to AC. Now 
take O' collinear with OA so that AO' = OA. Then 
Z.OAB= ZLO'AB = ^; A0 = 0'A; AB = AB; 
and therefore O'S = OS. 

Similarly OC' = OC, 

and Ay By C are all on the plane determined by O and O'. 

It follows from 7*54 and 7*52 that if we start with any 
three marks we can generate a plane containing them by 
joining up points collinear with pairs from the original three. 



7 * 55 . In general a line has one point in common with a plane. 
For if P and Q are marks on the line (Fig. 9) and O, O' 
determine the plane, and R is another point on the line, in the 
direction PQy 

OR^ = OP2 + Pi?2 _ 2OP.PR cos OPQy (i) 

0'S2 = 0'P2 + pp2 _ 20'P.PR cos O' PQy ( 2 ) 

and therefore OR = O'R if 

2PS (O'P cos O'PQ - OP cos OPg) = 0'P2 - OPl (3) 
If then O'P cos O'P 0 — OP cos OP 0 and O'P^ — OP* have 
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the same sign, there is a positive value of PR satisfying (3). 
If they have opposite signs there is no positive value of PR 
satisfying (3). But if Q' is on the line and P is between Q and 
Q\ there is a suitable point R in the direction PQ\ since 
O'P cos O'PQ' - OP cos OPQ' 

= ~ (O'P cos OTQ ~ OP cos OPQ). 
Clearly the conditions for suitable positions of R in the 
directions PQ and PQ' are mutually exclusive, and there is 
always one position of R that satisfies the conditions. If how- 
ever O'P cos O' PQ— OP cos OPQ the admissible value of PR is 
infinite ; in this case we say that the line \s parallel to the plane. 

7*56. It follows that in general there is one point common to 
three planes. 

Lines in a plane are said to be parallel if they make the 
same angle with a given line. 

7*57. Parallel lines have no common point at a finite distance. 
For if (Fig. 10) 

LCAB = a = LDBE, 

Z-DBA — 7T — a. 

If then AC and BD had a common point L, we should have 
the sum of the angles of the triangle ABL equal to 
a + (tt — a) + Z. ALB = tt + Z. ALB. 

But this is impossible since Ay B, L are not collinear and 
Z.ALB ^ o. 


i 



Fig. 10 


Fig. II 
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7*58. Any transversal intersecting the original one makes the 
same angle with two parallel lines. Suppose the parallels are 
given to make the same angle a with the line ABy and that 
CD is another line meeting them (Fig. ii). Let /.ACD == j3. 

Then ^EAC + A EC A = tt - ^ AEC, 

Z.EBD + LEDB = TT - LBED. 

But jLEAC = /LEBD, 

Therefore Z.ECA = LEDB. 

7*59. If three lines ABy ACy AD are all perpendicular to OAy 
they are in a plane. For if we make AO' = OAy we have 

= AB^ + AO'^ = AB‘^ + OA^ = BO\ 

and so on. Hence By C, and D are all in the plane determined 
by O and O'. 

7*60. Consider any two points L, M and a straight edge OP. 
Suppose points A on OP, B on OL, 

C on OM to have been found such 
that BAy CA are perpendicular to 
OP. Let 

OL = ry OM = r'y 
LLOP^Sy LMOP^e'. 

Then 

LM^ = + r'2 — zrr' cosLOM, (i) 

cos LOM = cos d cos d' 

+ sin 6 sin 6' cos BACy (2) Fig. 12 

by 7-50. If AK be any other straight edge through A per- 
pendicular to OP, Ky Ay By C arc in a plane, by 7*59* Let 

LKAB = ^y LKAC=^cj>'. 

Then LBAC ^ <l>^ ^ (3) 

LM^ = (r cos 6 — r' cos 0')^ + (r sin 6 cos <^ — r' sin 6' cos 

+ (r sin 0 sin ^ — r' sin 6' sin (4) 
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If now we define Xy z for L by the equations 

X = r sin 6 cos ^ ; jy = r sin 0 sin ^ = r cos 0, ( 5 ) 

we have = (^x — x')^ + Cv ~ y')^ + (^ — (6) 

Thus distance has been expressed in the standard form 
appropriate to Cartesian co-ordinates. 

The angles </> and cfy' are independent of the position of A 
provided AK is always taken in the same plane. The angle 
between two planes is defined as the angle between two lines 
in them perpendicular to the common line, and is constant 
by the corollary to 7 * 50 . 

The above definition of Cartesian co-ordinates is applicable in 
all cases where it is possible to find the distances and bearings of 
our marks, whereas the usual definition is not applicable unless 
we can actually find the projections on the three co-ordinate 
axes. We have still to show that our Xy y, z are identical with 
the usual co-ordinates when these can be measured. 

7'61. If we put X = hy y mr, z == nr, and consider two 
marks L, M given by (xiyyiy Zi), {x^yy^y z^ we have from 
7*6o (2) CQs LOM — 4/2 + 

7-62. At any point of OP, 0 = o, and therefore / = o, m = o, 
n = I. If OQ is perpendicular to OP, it appears from 7*61 
that n = o at g. If also Q is in the same plane as OAKy 
^ = o at and m = o. Hence at 0 / == i, nz = o, n = o. 
If OR is perpendicular to OP and OQy then again by 7 * 61 , 
at P, / = o, m = I, n = o. Thus OQy OP, OP are the co- 
ordinate axes as usually understood. 

7*63. If L {Xyyy z) be another point, the angle between OL 
and OQ is given by 

cos LOQ = /. I -h m.o + n.o, 

and therefore by 7*49 the projection of OL on OQ is rl or x. 
Similar results hold for the projections on the other axes. This 
gives the identification required. If one or more of /, m, n are 
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negative, the corresponding foot of the perpendicular from 
L on the axis lies on the production of the axis beyond O. 

7*64. If now a plane is determined by two points (^, c), 

(a\ b'y c'), we have for all points on the plane 

(x — a)^ + (jy — + (;2r — cy 

thatu. + + (.) 

2 (a — a') X -h 2 (b — b')y + z (c — c') z 

= a^-i-b^-i-c^- a'^ - A '2 - ( 2 ) 

Hence a plane has an equation of the first degree in the co- 
ordinates. Conversely if we are given an equation of the 
first degree, which in general involves three independent 
parameters, we can in a triply infinite number of ways assign 
the six co-ordinates of O and O' so as to make ( 2 ) fit it. Thus 
every equation of the first degree represents a plane. 

It follows that a straight edge is represented by a pair of 
equations of the first degree. Also if a plane has the equation 

Ax -h By -h Cz + D = o, ( 3 ) 

and P(^i, Ji, ^1), Q (^2> ^2) ^re two points satisfying ( 3 ), 

then 

\ nil + m2 ' mi + m2 ' mi m2 ) 

also satisfies ( 3 ). 

If then P and 0 are points common to two planes, the point 
R is also common to the two planes. If we call R (x^ j, z) we 
see that (^, y, z) satisfy 

^ ->^1 _ y-yi _ / X 

^2-^1 yz-yi ^2-^1’ 
the usual form of the equations of a straight line. 

From these results the usual analytic development can be 
carried out. 


7-7. The foregoing theory has been developed from the 
notion and properties of distance alone. Most of the pro- 
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positions inferred are verifiable by experiment, and have, of 
course, actually been verified. But other appliances exist that 
often enable us to supplement the theory and extend its 
practical application. The first we shall consider is the 
protractor y or graduated circle. A rigid body with one face 
plane, as tested by the application of a straight edge, is made 
so as to have a circular edge on this face ; that is, every point 
on the rim is at the same distance from some fixed mark on 
the face. Equidistant marks are made around the rim, the 
distance between consecutive marks being compared in the 
process of manufacture with the length of the turn of a 
standard screw in much the same way as in the construction 
of a scale on a straight edge. Then if we consider a triangle 
formed by the centre and any two consecutive marks on the 
rim, all such triangles have the same sides, and therefore the 
same angles. Now if we place the protractor with its centre 
in contact with the common mark on two intersecting 
straight edges, and with the rim intersecting both edges, we 
can count the number of scale-divisions on the rim between 
the two edges and use it as a measure of the angle between 
the edges; for either determines the other. Thus the pro- 
tractor measures angle as a fundamental magnitude. 

The actual distance between scale-divisions on a protractor 
is arbitrary. In practice it is always chosen so that 360 
divisions make up the complete circumference and return to 
the starting-point. If necessary finer graduations are inserted 
within the original 360. The degree is the angle subtended at 
the centre by two consecutive divisions on the edge. Now in 
our measures of angles so far we have specified the angle in 
terms of its cosine by the series definition of the latter; the 
number attached to a right angle is ^tt, and that attached to 
a complete circumference is 2tt, Is an angle, in terms of this 
measure, merely a number.? The test seems to be in attempt- 
ing addition. 2 sheep and 3 sheep make 5 sheep. But 2 
sheep and 3 houses do not make 5 of anything. Now do an 
angle 2 and the number 3 make 5 of anything? It appears 
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that they do not. Angle is of a different kind from number, 
and when we specify it in such a way that the number 
attached to a right angle is ^ we are really measuring it in 
terms of a conventional unit, which we call the radian, and is 
not a number. We should therefore write 

^77 radians = 90 degrees. 

This provides the necessary rule for converting measures of 
angles from one unit to another. 

The direct measurement of angle with a protractor now 
provides a complete substitute for the determination of the 
angle between straight edges in terms of measured distances. 
It is still not possible in general when we want the angle 
XO Y and the mark O is not connected to X and Y by actual 
straight edges. But we can supplement our methods again 
by using a property of light. It is found that whenever three 
marks A, B,C are collinear as tested by the straight edge, and 
the eye is placed so that two of them are in the same direction 
(a matter of direct sensation) the third is also in that direction. 
In practice we construct the two nearer marks, for accuracy, 
as the intersections of crossed threads, so that if the directions 
do not quite agree small discrepancies will be easily notice- 
able. Thus we have a direct test of collinearity, which agrees 
with the test of the straight edge whenever the latter can be 
applied. We can then generalize this as a test of collinearity 
and use it instead of the one based on lengths, since it is more 
accurate and easily applied. Now angles between the direc- 
tions of marks can be measured. Effectively the crossed 
wires O in the eyepiece, a distant pair X in the instrument, 
and the object mark A are placed in a line by the test of 
coincidence of visual direction; another distant pair Y is 
placed in line with O and the second object mark JS. Then 
the angle AOB is the same as the angle XOY, which can 
then be measured with a protractor. The sextant is based on 
a modification of this principle. The theodolite effectively 
contains two protractors, and measures two angles corre- 
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Spending to the 6 and < f > of 7*60 ; 0 is measured from the up- 
ward vertical and <f> from a plane including the upward 
vertical and the north. 

In general three measured data are necessary and sufficient 
to identify a position. They may be any functions of (r, 6, 
or of (aj, y, z) provided that neither of them is a function of 
the other two. But our principle of simplicity gives reason 
for regarding the Cartesian co-ordinates as the physically 
fundamental ones. Our directly recognized entities are 
straight lines and distances, and a plane is a notion that arises 
directly out of distance. Now Cartesian co-ordinates have 
the following simple features possessed by no others. Any 
plane is expressed by an equation linear in the co-ordinates. 
Any straight line is expressed by a pair of linear equations; 
and the co-ordinates of any point on the line are weighted 
means of those of any two other points on the line. The 
square of the distance between any two marks is the sum of 
the squares of the differences between the co-ordinates. No 
general relations of comparable simplicity hold for any other 
type of co-ordinates. We regard Cartesian co-ordinates as the 
physically fundamental ones on account of our principle that 
the fundamental laws of physics are simple in form. 



CHAPTER VIII 


NEWTONIAN DYNAMICS 

Nature, and Nature’s laws, lay hid in night: 

God said. Let Newton be I and all was light. 

Pope 

8*1. Many rigid systems exist. The criterion for a rigid sys- 
tem is that the distances between recognizable marks in it 
do not change with the time. If one distance in a system and 
all angles, as tested by optical instruments with graduated 
circles, do not change with the time, we still call the system 
rigid by our rules. Over considerable intervals of time most 
of the objects in this room constitute a rigid system. The 
angular distances between stars, as observed from the earth, 
vary with the time so little that decades are required to detect 
alteration even with the best measuring instruments. If then 
we consider a system of lines through a given point, and each 
directed towards a star, that constitutes a rigid system. 

When distances change with the time we are in a new 
realm, called dynamics. The marks in one rigid system may 
change their distances or directions from those in another 
rigid system. Thus a theodolite and the objects on the earth 
within its field of view constitute one rigid system ; the stars 
and an equatorial telescope with the clockwork going con- 
stitute another; but the directions of the stars change with 
respect to the theodolite, and those of objects on the earth 
change with respect to the equatorial. 

Objects whose distances and directions with respect to a 
rigid system are varying with the time are said to have motion 
relative to the system. Distance and angle have so far been 
considered only when they are constant for a given set of two 
or three marks. But even when they vary with the time they 
still exist. We can specify the position of a particle sliding 



132 


NEWTONIAN DYNAMICS 


down a curve by the mark on the curve that the particle is 
passing over. We can specify the direction of a planet by 
pointing an equatorial telescope towards it, and reading its 
right ascension and declination on the graduated circles; or 
we can photograph the region of the sky where it is, and 
measure its angular distances from neighbouring fixed stars 
just as we can measure the angular distances between these 
stars themselves. In such a case as the ascent of a pilot bal- 
loon, observed with two theodolites, we can actually observe 
the directions from two positions simultaneously and deter- 
mine the position of the balloon at each instant of observation 
just as for a fixed object. In dynamics we are therefore dealing 
with cases where distances, and those entities we have found 
to depend on them, still exist, but are now functions of the 
time instead of being constant. 

One clearly cut distinction arises immediately. In most 
rigid systems there is a continuous material connexion, trace- 
able by sight and touch, between all parts. If no such con- 
nexion is evident, as in a body in mid-air, or a planet, or the 
components of a double star, there is in general motion rela- 
tive to other rigid systems. There is therefore a strong sug- 
gestion that material connexion between bodies is antagonistic 
to relative motion. Even if there is relative motion to begin 
with, as in the case of a body projected along the floor, it 
soon stops when material contact is established. We shall not 
at present examine the nature of this phenomenon ; we merely 
give it a name. The property that one body does not move 
through another we call impenetrability) the property that 
relative motion tends to cease when one body slides over 
another we call friction. In dynamics, then, we refer our 
measurements to some rigid system, in which the laws re- 
lating measurements, whether made at the same or at different 
times, are already known ; but our subject-matter is the motion 
with reference to our rigid system of a body or bodies that 
are not constrained by material connexion to have no motion 
with reference to it. An immediate inference is that we should 
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consider in the first place those bodies that have the slightest 
possible material connexion with our frame of reference ; in 
this way we reduce to a minimum the interference of material 
connexion with motion and remove one independent variable. 

8*2. This condition is obviously satisfied fairly well by bodies 
in mid-air, and very well by the heavenly bodies. In the 
former case we take as our frame of reference axes fixed in 
the earth, and find that the motion, at any rate for massive 
bodies, is well represented by the differential equations 

d^x d^y d^z , . 

di^ 

where x and y are measured horizontally and z vertically up- 
wards, as judged by a plumb-line. The derived magnitude g 
is nearly constant. Now by simultaneous observations of the 
stars from different places on the earth’s surface we find that 
the direction of the plumb-line is not everywhere the same 
with reference to the directions of the stars, but points nearly 
to a fixed point in the earth, which we call the centre. If we 
take new Cartesian co-ordinates with respect to the centre of 
the earth as origin we now find that, wherever we are on the 
earth’s surface, the equations (i) lead to 

dt^ -~^r’ dt^ ~ dt^ - -.S'r’ 

where r is measured from the centre of the earth. This is a 
more general form than (i). It leads to a further suggestion, 
that the second derivatives of the Cartesian co-ordinates 
with respect to the time are of fundamental importance in 
dynamics; for they are expressed by three known functions 
of the co-ordinates themselves, and lead thereby to three 
differential equations for these co-ordinates. We call the first 
derivatives of the Cartesian co-ordinates the components of 
relative velocity, and the second derivatives the components 
of relative acceleration. 
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8»21. Now consider the motion of the components of a 
double star. We take an axis of a? in a direction fixed with 
reference to the directions of the majority of the stars, and 
such that the double star is always near it. The axes of y 
and z are taken in two directions perpendicular to each other 
and to that of x. We observe the angles between the direction 
of a component of the star and the two planes of ocy and xz\ 
or, what is equivalent, we take the point of intersection, with 
a plane plate perpendicular to the x axis, of the line joining 
the centre of the object glass of the telescope to the com- 
ponent. It is found that as time goes on the points given by 
the two components describe similar ellipses. If we put 
yjx = py zjx = 5 , and use suffixes i and 2 for the two com- 
ponents, the variations of p and q in the same interval of 
time are always opposite in direction and always in the same 
ratio, except for a uniform velocity shared by both com- 
ponents of the star. If we proceed to the second derivatives 
to remove this uniform part of the rate of change, we have 


A _ ^ 

P 2 


(I) 


Further, each ratio is equal to — A, Now p and q are 

always small, and the displacements at right angles to the line 
of sight are therefore small fractions of the whole distance. 
We must choose between two alternatives with regard to the 
displacements in the line of sight. If they are also small com- 
pared with the distance of the star, the variation of x is small 
compared with its mean value, and y and z for each com- 
ponent are nearly proportional to p and q. Then 


(2) 

^2 ^2-^1 

The alternative is that the displacements in the line of sight 
are comparable with the distance ; this would mean that the 
orbit of every double star is enormously elongated towards 
the earth, and we need not consider this possibility seriously. 
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Returning to (2) now, we see that there is probably nothing 
special about the line of sight and we can generalize the 
equations in the form 

3 _ yi _ . ^2 _ y2 _ ^2 

^2--^i y2-yi ^ 2 -^ 1 ' ^2--^i >2-^1 ^ 2-^1 

The components of acceleration are in the ratios of the differ- 
ences of the co-ordinates ; in other words, the accelerations of 
the bodies are along the line joining them. Further, we can 
choose a ratio of two quantities and Wg such that 

ffiiXi + ^2^2 = 0; ^lyi + ^2y2 = o; 4- WgiTg = o. (4) 

Distant bodies appear to produce no acceleration on one 
another, as is seen from the negligible or constant velocities 
of most of the stars. Hence we can say that the acceleration 
of each component is due to the proximity of the other com- 
ponent. 

Similar results are found for most of the satellites of the 
planets ; they are consistent with the acceleration in each case 
being directed towards the centres of the planets. 

8-22. Now consider the acceleration of the moon. To a first 
approximation the moon describes a circle about the earth 
with radius a and angular velocity n. The acceleration in such 
a path is an^ towards the centre of the earth. Taking 

£7 = 3*8 X 10^® cm., n = 277/27*3 days, 
we find that the acceleration is 0*273 cm./sec.^ Now a particle 
at the earth’s surface has acceleration 980 cm./sec.^, which is 
nearly 3600 times the acceleration of the moon. The distances 
are in the ratio i : 60 nearly, and therefore the accelerations 
are nearly inversely as the squares of the distances. 

8*23. These few facts relating to freely moving bodies sug- 
gest the following summary : 

A body has an acceleration in the direction of a neigh- 
bouring body, and proportional in magnitude to the inverse 
square of the distance. 
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The accelerations that two bodies produce on each other 
are in a ratio independent of the time. 

The first of these laws can be written in the form 




^ ^ . • • _ 
Y ^ •✓I 


fHyi-Vi. ^ _ 


y 2 y ’ ' ' 


where (^i, ji, are the co-ordinates of the body whose ac- 
celeration we want, Z2) those of the other body, and fi2 

is a constant of proportionality independent of the co- 
ordinates and of t. The second law then implies that 


„ _ Ml ^2 

^2 — ~2 I 

Y^ Y 


>'2 = 


fH yi-y2 . 

r2 Y ’ 


„ _ Ml -^1 “ -^2 


(2) 


where mi is a second constant of proportionality, different in 
general from M2 • 

This family of differential equations can be solved exactly. 
They are found to imply the following consequences : 

A point with co-ordinates {x^y^ z) given by 


(Mi M2) ^ — Ml^l M2^2> 


( 3 ) 


and two similar equations, moves with uniform velocity in 
a straight line. We call this point the centroid of the two 
particles. 

Relative to this point both the bodies describe ellipses, the 
ellipses being similar but having their axes in opposite direc- 
tions, and the centroid being in a focus of each. 

The line joining the centroid to either body sweeps out in 
any interval of time an area proportional to that interval. 

The mean distance a between the bodies being defined as 
the mean of their greatest and least distances apart, and the 
mean motion n as ztt divided by the time of describing the 
orbit, 9 I / \ 

(4) 


These results express Newton’s solution of the Problem of 
Two Bodies. It is found to describe accurately the motions 
of double stars. Only motions at right angles to the line of 
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sight being measurable*, what we actually verify is that the 
movements agree with the projections of elliptic motions that 
follow the laws. In other words, the behaviour of two of the 
three variables determined by the solution is completely 
verified; our present analjrsis is neither confirmed nor con- 
tradicted by observations of the other variable, for these do 
not exist. 

When we consider the motions of satellites about their 
primaries, the same solution is found to fit the relative motion, 
as to the two measurable co-ordinates. Also the planet itself 
shows no departure from a regular motion relative to the 
stars ; over intervals of time amounting to several periods of 
revolution of the satellites the motion of the planet is sensibly 
uniform. It appears therefore that the centroid of the planet 
and any satellite is practically coincident with the centre of 
the planet, and therefore that if /xi refers to the planet and /xg 
to the satellite, /i2//>*'i is always very small and /xj + /xg is 
practically /xi . But for different satellites of the same planet 
we get a further check; the quantity is found to be the 
same for all. The constancy of for different satellites 
also therefore implies that fxi is a property of the planet. 

Coming now to the motions of the sun and planets, we can 
observe in each case only directions as seen from the earth 
with reference to the stars, except in the case of the sun, where 
we can estimate the variation of its distance by measuring its 
angular diameter from time to time. In this case then we can 
check all three co-ordinates, and we find that the motion of 
the sun relative to the earth is definitely an ellipse with the 
earth in a focus. For the other planets it is found that 
ellipses can always be found with the sun in a focus, such that 
the radius vector relative to the sun sweeps out area at a uni- 
form rate, and the direction of the planet as seen from the 
earth agrees with that predicted from the various elliptic 

^ That is to say, in terms of the considerations of direction that we have 
used so far. Velocities in the line of sight can be measured by means of 
the Doppler effect, and agree with the laws, but we are not yet in a position 
to discuss the theory of that effect. 
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paths. It may be noticed that each elliptic orbit is specified 
when six quantities are given, namely the three co-ordinates 
and the three components of velocity relative to the sun at 
some definite instant. With only these six adjustable para- 
meters it is possible to fit an indefinitely large number of 
observations of direction as the planet describes its orbit. 
The alternative to supposing that the motion is actually a 
Newtonian elliptic orbit is that the distance of the planet from 
the earth does not follow the rules found for elliptic motion, 
but that the co-ordinates are so related that the direction does 
satisfy these rules. As there is no intelligible reason why this 
should be true apart from the truth of the equations of motion, 
we do not treat this alternative seriously. 

It is now found that for each planet the quantity is 
the same. As for satellites, we therefore argue that it expresses 
a property of the sun, and that for each planet is very small 
compared with its value for the sun. This can be checked 
directly, since the values of fi for those planets that have 
satellites are already known from observations of the motions 
of the satellites relative to their primaries. Also it is found 
that the value of found for the motion of the sun relative 
to the earth is the same as that found for the motions of the 
other planets relative to the sun. It therefore expresses a 
property of the sun rather than the earth, and we say that all 
the planets, the earth included, describe elliptic orbits about 
the sun. This is legitimate because the co-ordinates of the 
earth relative to the sun, the directions of the axes remaining 
the same, are necessarily equal and opposite to those of the 
sun relative to the earth, so that if the sun describes relative 
to the earth an elliptic orbit with the earth in a focus, then 
the earth also describes, relative to the sun, an elliptic orbit 
with the sun in a focus*. 

* Copernicus and Kepler laboured under the disadvantage of having 
no accurate observations of double stars or satellites among their data. 
Jupiter’s greater satellites were discovered in 1609, the year of the publica- 
tion of Kepler’s first two laws, but their orbits are nearly circular. The 
same applies to the two largest of those of Saturn. Accurate demonstration 
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8-24. We have seen therefore that a body in the neighbour- 
hood of a second body has an acceleration towards the second 
body, inversely proportional to the square of the distance 
apart, the constant of proportionality being a property of the 
second body alone. What happens when there are several 
bodies in the neighbourhood? We really have had the answer 
already in the motions of satellites; for while a satellite is 
moving about its primary it is also sharing the general motion 
of the primary about the sun. If its acceleration was merely 
that towards the primary, while the primary is moving about 
the sun, the primary would leave the satellite behind*. The 
satellite must have also an acceleration towards the sun, which 
is nearly the same as that of the primary because they are at 
nearly the same distance from the sun. We must therefore 
generalize our law to the case where n bodies are moving in 
one another’s neighbourhood. We say that any one body has 
an acceleration towards each of the others, whose com- 
ponents are given by our law; and the total component 
acceleration in any direction is the sum of the components 
in that direction given by the other bodies separately. 
Formally we say that if we consider the /th body, 

V i^yi jVw) , 

' ^ ’ 

(5) 

' Im 

where the suffix m refers to another body of the system, 
is the distance between the bodies specified by I and m, and 
the summation is for all values of m except /. 


= -S 


^^m {p^l 


n). 


yi 


of the elliptic motion in double stars is a matter of comparatively modem 
observation. If Kepler had had such data, it would not have taken him 
six years to hit on the elliptic law of planetary motions. As it was, he had 
to tackle directly the more difficult problem of the motions of the planets, 
in which the complicating influence of the earth’s motion is seen at its 
worst. 

* This is serious; the acceleration of the moon towards the sun, for 
instance, is about twice its acceleration towards the earth. 
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We notice that 

== {Xi - X„Y + O, - + («, - ar„)*, (6) 

and (7) 

with similar relations. Then if we multiply the equations (5) 
respectively by and add, we get 

The left side is a complete differential with regard to the 
time. If then we integrate from time <o to time ti we get 


cj* + yY + zY 


where 


Y(dU^, ^dUi. 

^ J, [w, *• + ly, 


dzi] , (9) 


We can also multiply (8) by /xj and add for all values of /. 
Then the pair of particles given by / and m make a contribu- 
tion to the right given by 

/. 3 ,. 9 9 ^ \ ^ 

9^^ +yi 9^^ +yn, 9^^ + 2 , 9^^ + 

d I . . 

since r,m is a function of only. It follows 

that if „ , „ 

U = '^l^, (12) 

^Im 

where the summation is for all pairs of particles, 

l^s {xY + yj* + \ (13) 

This is a very remarkable result. For the expression 

+i'i* + ^i*) 
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is the square of the resultant velocity of the particle given by 
/; where we consider the positions of the particle at times t 
and t + dty and take the distance between them, and define 
the resultant velocity as the limit of the ratio of this distance 
to dt when dt becomes very small. The quantities (i are 
properties of the various bodies and independent of their 
position. Thus the equation (13) expresses a relation between 
our fundamental notions of distance and time alone, and is 
independent of the particular set of axes of j, and z that 
we choose. Further, (5) are equivalent to 

.. dU .. dU „ dU , . 

The generalization (5) makes a great improvement in our 
representation of the motions within the solar system. 
Kepler’s laws give a good first approximation to the motions 
of the planets ; their application to the motions of the satellites 
relative to the planets also gives a good first approximation. 
But there are outstanding discrepancies. There are periodic 
inequalities in the moon’s longitude with amplitudes of the 
order of a degree; others in the longitudes of the planets of 
the order of, in extreme cases, considerable fractions of a 
degree; there is a long-period disturbance of Saturn with a 
period of 900 years and an amplitude of nearly a degree ; and 
in addition the elements of the orbits show slow progressive 
or secular changes, the major axes in particular revolving in 
one direction or the other relative to the stars. The result of 
allowing for the mutual influence of every pair of bodies in 
the system is that nearly all these inequalities are accounted 
for. Without further modification we can account, within 
the limits of observational error, for the motion of every 
major planet from Venus to Neptune, all the asteroids, and 
most of the satellites. 

The outstanding discrepancies all concern cases where the 
body whose motion we are considering is very near its 
primary. This fact suggests an explanation; for we have seen 
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that the acceleration of a body is always towards the body that 
produces it. If the latter is remote, the lines joining the first 
body to all parts of the second are nearly in the same direc- 
tion. But if the bodies are fairly close together these lines are 
not in the same direction, and if the second body is not 
spherical its field will not be symmetrical, and the acceleration 
of the first body will not necessarily be directed towards a 
fixed point of the second. Such considerations do as a matter 
of fact account for most of the outstanding inequalities of the 
satellites. The only remaining inequality of importance con- 
cerns Mercury ; its discussion is reserved till later. 

We have seen that the acceleration of any body can be con- 
sidered as made up of contributions from the others, each of 
which can be said to be due to another particular body in the 
sense that it would be zero if the other body were not present. 
If then we denote the part of the acceleration of the particle I 
due to the particle m by Xi ^ , yi ^ , we have 

~ CtC. (^S) 

We can call the terms in this equation the respective forces 
of the bodies on each other, and we arrive at a result equiva- 
lent to Newton's third law, that action and reaction are equal 
and opposite. 

The equation (15) is probably most directly verified in the 
solar system by the mutual perturbations of Jupiter and Sa- 
turn, and by those of the four great satellites of Jupiter. The 
disturbance of the position of the sun by the attractions of the 
planets affects the positions of the planets relative to it, but 
cannot be disentangled explicitly. The earth and moon move 
about their common centre of gravity, which moves practi- 
cally like a single particle. There is therefore a monthly 
oscillation of the earth's position, which is shown by a corre- 
sponding variation in the apparent direction of the sun, and 
gives a means of determining fi for the moon. But there is no 
other way of finding fx for the moon from the translational 
motions of bodies, so that this determination is at present 
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merely an application of the principle and not a check 
on it. 

We may remark that there is nothing conventional about 

the quantities fi. They are perfectly definite derived magni- 

^ j mi. r ^1- 9 (earth’s mean distance)® 

tudes. Thus u for the sun = att® ^ 7 , 

^ (i year)® 

The notion of mass has not yet appeared explicitly. 

8*3. There is one apparent inconsistency in the development 
given so far. In considering the motion of a body near the 
earth’s surface, we referred it to an origin at the centre of the 
earth and axes fixed in the earth. In considering the motions 
of the bodies in the solar system we have used axes whose 
directions are fixed with reference to the stars. But axes 
fixed in the earth do not keep the same direction with refer- 
ence to the stars, or conversely; and it is easy to see that the 
equations of motion cannot keep the same form if the axes 
are rotating. Thus our equations 

ix,y,z) = (i) 

are satisfied as they stand if 

x==acosajt; y = asincot; ^ = 0, (2) 

where a and co are constants such that 

co®a® = /i. (3) 

But if we take axes of (x\y\ z') rotating about the z axis with 
angular velocity co, the co-ordinates in the two systems are 
connected by relations 

x' X cos a)t y sin (ot; y' = ~ x sin cot + y cos cot ; 

5r' = 2:. (4) 

Then x' = a, y' = o, z' = o, (5) 

but in these co-ordinates the equations of motion in the form 
(i) are not satisfied, for the first reduces to the impossible 
form 
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It appears therefore that we cannot retain the same form of 
the equations of motion for all sets of axes in relative rotation. 
If the form (i) is true for axes with directions fixed in relation 
to the stars, it cannot be correct for axes fixed in relation to 
the earth, and conversely. Now our study of the motions 
within the solar system has been made with reference to axes 
fixed with reference to the stars. If we assume this to be true 
in general a modification is needed for axes fixed in the earth, 
which is given in books on dynamics. It is actually found to 
be small for a projectile moving near the earth^s surface, 
really because the time of flight is always so small that the 
earth rotates through only a small angle during it. But the 
correction is appreciable in long-range gunfire, and has to be 
taken into account in accurate shooting. The equations there- 
fore hold for axes fixed in direction in relation to the stars. 

In the last resort this statement requires to be made a little 
more precise; for, though the angles between the directions 
of the stars are nearly constant, they are not quite so. The 
stars have slow proper motions among themselves, and if we 
fix the directions of our axes with regard to one pair of stars 
they will vary with regard to another pair. In practice how- 
ever it is found that there are an abundance of stars the 
angles between whose directions remain constant as nearly as 
we can observe. These are the distant stars, and we use these 
as our general standards of direction. But strictly, even the 
most distant stars must have accelerations on account of the 
law itself, and we can never identify absolutely non-rotating 
axes. This does not affect our belief in the truth of the law, 
of course. The law is approximately true as a matter of ob- 
servation, and fits the observations as closely as we can tell ; 
given these properties, the law has a high probability because 
it is simple. 

The position of the origin has been left somewhat vague. 
It is plain that if we take a different origin moving with a 
uniform velocity with respect to the first one, the co-ordinates 
are merely reduced by quantities of the form a + ut, which 



NEWTONIAN DYNAMICS 


14s 

are the same for all particles. Hence quantities of the forms 
X and Xi — x^ are exactly as before. The equations of motion 
are unaffected by a displacement or a uniform velocity of the 
origin. This is Newton’s principle of relativity. But actually 
we may take a new origin moving in any way whatever. If 
we do so, the quantities Xi-- x^y and therefore all distances, 
remain unaltered ; x may be changed, but by the same amount 
for all bodies, and therefore Xi — x^ is unchanged. The 
differential equations for the differences of the co-ordinates of 
the various bodies remain as before. But actually we can 
never observe actual co-ordinates; all we observe are the 
differences of the co-ordinates. It appears therefore, that as 
far as actual tests are concerned the origin may move in any 
way whatever and the equations of motion will still lead to 
correct results for the quantities that we can observe. 

There may, however, be advantages in having equations 
of motion that are actually true, rather than such as contain 
errors that can never be discovered. If we consider the point 
with co-ordinates (x,y, z), which we may call the centroid of 
the universe, defined by 

(Sf/.i)x = S/iiXi, (i) 

with similar equations, we have 

X = ^fJ,iXi = 2 (/^iXijn ~ O, (z) 

l,m 

if the equations of motion are true as they stand. Then the 
centroid moves with uniform velocity with reference to the 
origin. Conversely, if our origin moves with uniform velocity 
with reference to the centroid of the universe, the equations 
of motion are true. We notice that if we shift the origin to 
the centroid, the new co-ordinates take the form 

x,-x, X- 


JSI 


10 
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Thus the equations in this form involve differences of the 
co-ordinates alone. 

With these co-ordinates 

+ 2X 

and the second term vanishes by (i). Also x is constant or 
zero. Thus 

S/xj {xi^ 4 - yi^ + Zi^) = lijxi (^2 + 5^2 ^2) 

+ 2/xi {xi^ + yi^ 4- Zi^)y (s) 

in which the first term is constant if we have chosen the 
origin suitably. 

8*4. When we come to deal with bodies that are not moving 
freely we find a difference at once. The acceleration becomes 
infinite when the bodies become indefinitely close if the fore- 
going equations are true. Actually when two bodies come into 
contact the relative acceleration disappears ; the law of attrac- 
tion undergoes serious modification at this stage*. What are 
we to say about a book lying on a table? Two courses are avail- 
able. The book is not in contact with the earth; we may say 
that due to the earth the book has, as usual, the acceleration g 
downwards, but owing to the proximity of the table it has 
also an acceleration g upwards, and the two cancel. Other- 
wise we may just say that the acceleration is zero and leave it 
at that. It is in many cases a matter of convenience which 
course we adopt ; both give the same answer as far as observ- 
able phenomena are concerned. But the discussion of the 
additional reactions associated with impenetrability and fric- 
tion has the important feature that it brings out new physical 
laws. 

Consider a common balance with a fixed counterpoise in 
the pan A. We place various bodies in the pan B. According 

• Humpty-Dumpty would, of course, have been quite wrong had he 
said that ImpenetrMlity meant a nice knock-down argument. The im- 
penetrability of the wall was what was keeping him in position in spite of 
gravity; it was when a gust of wind removed him from its range of influence 
that he became a freely moving body and could be said to be knocked down. 
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to our results for freely moving bodies, each of these has its 
appropriate /li, and each has an acceleration g downwards due 
to the earth. Then in terms of equation (i) there is something 
associated with the effect of the earth on each body that is 
expressed by the quantity fig. If the balance is held with the 
counterpoise in place and released, the counterpoise goes 
down and the pan B rises. If the experiment is repeated with 
various bodies in the pan By some rise and others fall when 
the balance is released. It appears that the effect of the 
balance on some bodies is enough to overcome figy and in 
others is not. But in each case the balance itself, with the 
counterpoise, starts off from the same conditions. Thus the 
operation of using the balance classifies bodies according to 
their values of fig or, since g is the same for all, according to 
their values of fi. 

We have a check on this. If we return to the problem of 
the solar system and suppose that the distances between the 
bodies specified by mi, tn^y m3,... m^ are all small compared 
with their distance from the body specified by /, the accelera- 
tion of the last due to all together is nearly 

(/^mi 4" ••• “h f^infi)l^\m 

towards the centroid of all the particles. Thus for particles 
close together the effects on another body are expressible by 
treating fi as an additive quantity. If we place several bodies 
in the pan of a balance at once, similarly, the effect of the 
balance on all together is in opposition to the effects of the 
earth on all together ; and since we must measure the effect of 
all on the earth by the sum of the values of figy we naturally 
measure the effect of the earth on them also by the sum of 
the values of fig. 

It appears therefore that a balance with a standard counter- 
poise provides a means of discriminating between bodies, or 
combinations of bodies, according to the sums of their re- 
spective fi's; and these sums have the additive property. It 
follows that the mass m of a body, determined by the balance 


10-2 
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as a fundamental magnitude, is proportional to the derived 
magnitude /x. Being a fundamental magnitude, mass has to 
be measured in terms of a unit ; the value of /x for this unit 
mass remains to be determined. We shall denote it by /, so 
that fjL == fm. We can now generalize the notion of mass to 
bodies too large or too inaccessible to be weighed on a balance, 
by saying that it is proportional to /a, which exists in general. 
Then our law 8*24 (15) relating the effects of two bodies on 
each other takes the form, for any pair of masses and , 

m^Xi2 + ^2^21 = o- 

In this form we call the force on ttix due to Wg, and 
the force on due to Then we have Newton’s 
third law in its usual form, that the forces on two bodies due 
to each other are equal and opposite. We may denote them 
respectively by and . 

We have also the law that the acceleration of a body is the 
sum of those due to the other bodies in the world, obtained 
by adding the components in Cartesian co-ordinates. If we 
simply multiply this equation by the mass of the body, we 
have the rule for the composition of forces due to different 
bodies, that the total force on a body is the resultant of the 
forces due to the other bodies, obtained by adding their 
components in Cartesian co-ordinates. 

The reasons why force is interesting are then, first, that 
it has the symmetrical property that action and reaction are 
equal and opposite; second, that the forces acting on a body 
due to other bodies are additive; third, that when the state of 
a system is known the forces are found to be determinate 
functions of the co-ordinates and possibly the velocities. 
Thus the equations of the form mx = X are strictly differen- 
tial equations for the co-ordinates. 

Strictly these results have been established only for freely 
moving bodies. We proceed to extend them to bodies in 
general, even when the phenomena of impenetrability and 
friction arise. Their complete verification is then impossible. 
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because motion in one co-ordinate is always prevented by 
the connexion of the apparatus with the earth, and though 
there may be a force between the apparatus and the earth we 
can never measure the corresponding acceleration of the earth. 

But their partial verification is easy. Consider a body of 
mass mi hanging by a string of mass m 2 from the pan of a 
balance, the whole being at rest. The body is acted on by a 
force mig downwards, which we call its weight. It has no 
acceleration. Therefore the string is producing on it a force 
mig upwards. If action and reaction are equal and opposite, 
the body is therefore producing a downward force mig on 
the string. Also the weight of the string is m^gy so that there 
is a total downward force {mi + m^ g acting on the string. 
But the string has no acceleration and must therefore be 
acted on by an upward force (mi + m 2 ) g from the balance. 
Therefore the pan of the balance is subject to a downward 
force (mi -|- m^ g. But if we untie the string and place the 
string and body in the pan of the balance the counterpoise is 
undisturbed. The force on the balance pan in the two cases 
is therefore the same ; and in the second case we know directly 
that it is equal to the sum of the weights of the two bodies. We 
have therefore a verification of a direct inference from the laws. 

8-5. Now consider a body with a plane face resting on an 
inclined plane at an inclination a to the horizontal. It is subject 
to a vertical acceleration ^ due to the earth. This is equivalent, 
whatever axes are chosen, to an acceleration g cos a normally 
into the plane and another ^sina down the plane. If the 
body remains at rest, these must be balanced by reactions due 
to the plane. If m is the mass of the body, there must therefore 
be acting on it a force mg cos a normally and one mg sin a up 
the plane. We call these the normal and frictional reactions. 
The first preserves impenetrability, the second prevents slip. 

But if the slope of the plane is gradually increased it is 
found that at a certain inclination A the body can no longer 
remain at rest, but slides down the plane. At greater inclina- 
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tions it has an acceleration g (sin a — tan A cos a). The first 
term corresponds to the component of gravity. The second 
shows that there is a frictional reaction mg tan A cos a, acting 
against the motion. There is still no normal acceleration, and 
therefore the normal reaction is still mg cos a. The ratio of 
the frictional reaction to the normal one is therefore tan A 
when a > A and tan a when a <X. The constant A is a pro- 
perty of the nature of the surfaces in contact. 

But if the body is projected up the plane, or if it is projected 
down the plane when « < A, it is found that the portions 
mg sin a and mg tan A cos a behave differently. The former 
always acts down the plane. The latter acts against the direc- 
tion of motion, whatever that may be. If we take the body 
by hand and slide it about the plane by applying a force 
parallel to the plane, the former force is helping us when we 
push the body down the plane, and opposing us when we 
push it upwards. The latter is always opposing us. This 
introduces us to the distinction between conservative and non- 
conservative forces, which requires further elucidation. 

The equation 8-24 (13), with a suitable origin, is equivalent to 

[ (*,» + yi^ + (I) 

L J/o L Mm 

In this form we may call the left side the kinetic energy of the 
system, and the contribution to it from each body the kinetic 
energy of that body. If we call the expression on the left 
T, and that on the right C/, then however the system may 
move T — U remains constant. If it should happen that the 
system ever gets back to its initial position, it will have its 
initial kinetic energy again. 

If the total force acting on any body is {X, F, Z), and we 
start from the three equations of motion of the type 

mx = Xy (2) 

we can infer 

(x^ + y^ + (Z* +yy + Zz) dt. (3) 
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We express this in words by saying that the increase in 
kinetic energy is equal to the work done on the body. If we 
have a system of bodies we can add the equations of this type 
for all, and say that the increase of kinetic energy of the 
system is equal to the total work done on all the bodies. In 
any case this applies to the actual path. But we have seen 
that the forces (X, y, Z) are determinable as functions of 
the positions and velocities. However we imagine the system 
to travel from its initial position to its final one, by whatever 
path and at whatever rate, the integral on the right of (3) 
has some value, provided we give {X^ y, Z) the proper values 
for a body with the co-ordinates and components of velocity 
that we are considering. It happens in many cases that the 
value is the same whatever path we choose, provided the 
initial and final positions are the same. This is true, for in- 
stance, in the motion of the bodies of the solar system. If so, 
the work done is a function of the initial and final positions 
only and not of the path taken. Such a system is called 
conservative. 

In the case of the body on the inclined plane, if we imagine 
it displaced a distance s down the plane, the work done is 
mgs (sin a — tan A cos a). If it is then brought back to the 
starting point, the force is now mg (sin a + tan A cos a) 
against the direction of motion, and the work done is 
— mgs (sin a + tan A cos a). Adding the two together we have 
the total work done, — 2mgs tan A cos a. This depends on s 
and therefore on where the body has been in passing from 
its initial to its final position. The system is not conservative. 

It appears that the work done by non-conservative forces is 
always negative. It is found also that, associated with it, there 
is a change in the state of the system, which we call heating, 
and can detect by direct sensation or by a thermometer. 

8*6. A further refinement must be introduced at this stage. 
We have proceeded so far by supposing that the position of 
a real body can be expressed by three co-ordinates (x^y^ z). 
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This is not actually true, because a body has finite size and 
may have rotation. But in fact co-ordinates are obtained in 
the process of measurement, and we must examine the 
meaning of the co-ordinates we have obtained. For a planet 
or a star we have not watched a particular mark on the surface ; 
our result is really that there is a point within the body whose 
co-ordinates do satisfy our equations. Actual marks on the 
surface have additional accelerations in consequence of the 
rotation, and do not satisfy the equations as they stand*. In 
reality such expressions as mx have no meaning unless the 
value of X can be treated as uniform throughout the region 
considered. But then the region is in general only a small 
portion of that occupied by the body, and we should take 
into account the forces due to the other portions of the body 
that surround it. 

The principle actually used is that the internal reactions 
between portions of a body constitute a system in equilibrium. 
There seem to be several grounds given for accepting this. 
If we consider any particle at {Xy jy, z)y we can write its 
equations of motion in the form 

= (i) 

where X is the force on it due to external bodies and X' that 
due to other parts of the same body. If we add up these 
equations for all particles we get 

2 mJc = SX + SZ'. (2) 

Also from the equations of the form (i) we can obtain three 
of the form 

S/w {yz - zy) = 2 (j;Z - + 2 (jZ' ~ zY'). (3) : 

Now if we consider any pair of particles and m^y their 
forces on each other are equal and opposite. Hence in the 

• A particle at the earth's equator has an acceleration of 3*4 cm./sec.^ 

towards the axis on account of rotation; the general acceleration of the 
earth towards the moon is 3-4 x io“* cm./sec.**. 
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sum SX' the forces cancel in pairs, and the total is zero. 
Also since the forces do act on both particles, they must act 
along the line joining them’"'. Hence if the particles are at 
{^vyv ^1) arid (^2> -^2) have for the forces on due to /Wg, 


r _ z' 

yz-yi 


(4) 


and those on m2 due to trii are equal and opposite. Hence the 
pair make a contribution to S (yZ' — zY') equal to 


(y,Z' - z, Y') - (y,Z' - z, Y') = o, (5) 


by (4). Hence in (3) the reactions cancel in pairs and con- 
tribute zero to the total. 

If we accept the atomic constitution of matter and suppose 
all atoms and electrons to act radially on one another this 
argument is valid. It has, however, seemed premature to 
many to accept it at the present stage. It is not obvious that 
such an ultimate analysis of the reactions is possible. 

An alternative is to say that the internal forces depend on 
the body itself and not on outside agencies. Suppose then 
that the external forces are zero. If then the contributions 
to (2) and (3) from the internal forces were not zero, the body 
would begin to move of its own accord. For rigid bodies this 
does not happen, as a matter of experimental fact. But it 
seems wrong to generalize this to bodies under external forces 
and in a state of rotation. The internal forces are then cer- 
tainly different from what they are in a stationary body under 
no external force, and this procedure gives us no ground for 
believing that the additional forces satisfy the rule. 

It seems to me that the proper procedure is to recognize 
that the principle of d'Alembert has a moderate probability 
on account of the considerations on mutual influence of 
particles, and to investigate its consequences. If they are 
found to agree with experiment the principle becomes a 


If the particles are magnetic doublets this is not true. 
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scientific law in the usual sense. Now if we define the centroid 
of a body by the equations 

(Zm) X = Zmx ; (Sm) y = Zmy ; (Zm) z = ZmZy (6) 
it follows from d’Alembert's principle that 

{Zm)^ = ZX, ( 7 ) 

with two similar equations. The equations of motion that 
we have used hitherto are therefore satisfied by the co- 
ordinates of the centroid. 

It remains to show that the centroid is actually fixed in the 
body. This is usually taken for granted, but it is not obvious 
that the centroid, which is so far merely a point whose co- 
ordinates are defined by (6), is also always at the same 
particle of the body. But if we consider the distance of 
the centroid from a given particle at (xi^yiy Zi) and denote 
other particles by nii at jVi, Zi) we have 

(Zm) {Xi — jc) = (Zm) Xi — Zmx 
and = ^”*1 (*i - ^i)> (8) 

(2m)* ri* = {2m, - *,)}* + {2m, (j, - >»,)}* 

+ {Sw, (ar, - Zi)}\ (9) 

The square terms on the right give 

Zmi^ {(^1 - XiY + {yi - yif + (z^ ~ ZiY) 

= Zmi^r^i^. (lo) 

The product terms are of the form 

zZmittiy {{xi - xi) {xi - xi) + {yi- yi) (ji - yv) 

+ {zi - Zi) {zi - Zi.)} = zZmitrii^r^ir^i, cos (/i/'), (ii) 
where by (/i/') we mean the angle subtended at by the line 
joining the particles nii and and the summation is for 
all pairs of particles. But in a rigid body all distances and 
angles defined by pairs and triads of given particles are con- 
stant. Hence every term on the right of (lo) and (ii) is 
constant, and therefore is constant. Thus the centroid is 
at a constant distance from any particle of the body and 
therefore is fixed in the body itself. 
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Now we can denote the sum of the masses of the particles 
by M, which is the mass of the body as a whole. The equa- 
tions (7) now reduce to the usual form 

Mx = 

and so on. The other equations (3) can be shown, as is done 
in works on dynamics, to determine the motion of a rigid 
body about its centroid : that is to say, its rotation. 

8 * 61 . We have seen that the gravitational force on one body 
due to another is of the formfmim2lr\2> acting along the line 
joining them. When the bodies are of finite size both the 
magnitude and direction of the force are somewhat vague, 
but we can make them precise also by considering the bodies 
as made up of particles. If then X is the force on a particle of 
mass m due to all other particles, we can write 

where V = E ~ , 

r 

where rn! is the mass of another particle, r the distance be- 
tween m and m', and the summation is for all the other 
particles. The function V has a definite value at all places 
inside or outside of bodies, and is called the gravitation 
potential. It can be applied to determine the quantities of the 
form SJ? and S {yZ — jsrY) for any body, and hence to obtain 
differential equations for the motion of the body. For bodies 
that have not spherical symmetry the sums S {yZ -- zY) do 
not in general vanish, and consequently produce changes in 
the rotation. This result can be checked by appeal to the 
motion of the earth. The earth is not a sphere, but an oblate 
spheroid. The attractions of the sun and moon on it produce 
changes of the rotation of various types that can be predicted 
theoretically. The axis of figure describes a cone relative to the 
centroid of the earth, the axis of the cone being at right angles 
to the plane of the earth's orbit about the sun. This motion 
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is slow; the complete revolution takes 25,000 years. Super- 
posed on it is an oscillation called a nutation, introduced by 
the fact that the moon’s orbit about the earth is not always in 
the same plane. This causes the angular distance of the earth’s 
pole of figure from the pole of the ecliptic to vary in a period 
of about 19 years, while the rate of its motion has a periodic 
variation in the same time. All the phenomena contain the 
factor (C — A)ICy where C and A are the earth’s greatest and 
least moments of inertia. This factor is not determinable 
except from these phenomena. But the two components of the 
nutation depend on the mass of the moon, and not directly on 
that of the sun. The rate of the precession involves the masses 
of both bodies. Thus when we have observations of the pre- 
cession and nutation we can use them to determine the ratio 
(C — A)jC and the mass of the moon. It is found that the 
mass of the moon given by this method agrees with that 
found from the earth’s monthly motion. Thus we have a 
quantitative check on the truth of d’Alembert’s principle. 

8*62. The important constant / has to be determined by 
direct observation of the attractive force between bodies of 
known mass at the earth’s surface. It is found that the 
couple needed to twist a fine fibre of vitreous quartz through 
any angle is proportional to that angle. A bar with two lead 
spheres on the ends is suspended at its centroid from such a 
wire, and the period of the oscillation of the bar as it executes 
torsional oscillations is determined. The moment of inertia of 
the bar being known, this gives the couple exerted by the 
wire for any twist, in terms of c.g.s. units. Then the bar is 
allowed to take up its equilibrium position. Two large lead 
spheres are then arranged so that their attractions tend to 
twist the wire, and the head of the wire is then turned round 
until the torsion of the wire brings the bar back to its equili- 
brium position. The amount of turn required determines the 
magnitude of the attractions of the spheres and hence the 
constant /. 
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The equations of motion can be put into a form depending 
as directly on fundamental concepts as the conservation of 
energy. Suppose that the co-ordinates of a particle of a 
system of mass m moving according to these equations are 
{x^y^z). Then {x^y^z) are definite functions of the time. 
Take any three other functions of the time {hXy 8j, 8^), re- 
stricted only to be differentiable. Then the equations of 
motion are equivalent to 

m (ij8jc + y^y + zhz) = X8x -f- Y8y + Z8z, (i) 

Suppose these equations added for all particles of the system, 
and the result integrated from time to time . Then 

[ 'Lm{x8x^y8y-{-z8z)dt=\ Yt{X8x-\-Y8y-^Z8z)dt, (2) 

Jto Jto 

Now imagine the system to be moved from time tQ to tx in 
such a way that at time t the co-ordinates of the particle m 
are {x + + Sy, z + 8z), We may call -h 8:v a varied 

co-ordinate and x + 8x2i varied component of velocity. Then 

8^c = + 8^c) — X = ^ (jic + 8^) — ~ ^ 8x. (3) 

The left side of (2) is 

== Sm {x8x + y8y + 282)! 

Mo 

But ^ ^ — i (^ + 8^)^ — — 8P, (5) 

so that if T = SJw (x^ + + z^)y (6) 

^fn{xpx + y^^ 8 y + z§^ 8 z) 

= 8T — Sw (8;c^ + 8 y^ + 82^). (7) 
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If ( 8 x, 8^, 82) vanish at times and ii , so that the varied 
co-ordinates begin and end at the same values as the actual 
ones, the first term in (4) is zero. Then 

r* {87 + S (Z8* + Y8y + ZSz)} dt = 0 (8x, 8y, 82)*. (8) 

J to 

Now the forces in the system may be conservative, so that 
when the system is displaced from one position to another, 
by any route, the forces do the same amount of work. Then 
there is a function U depending only on the relative positions 
of parts of the system, such that 

S {X8x + Y8y + Z82) = W + 0 {Sx, Sy, 8z)\ (9) 
Then finally 

I 8{T + U)dt = 0 {8x, 83;, 82, 8x, By, Bz)\ (10) 

J to 


If we define a function 
S = 


'\T+ U)dt, 

to 




then 85 for small variations in the path is of the second order 
in those variations. This is Hamilton’s principle. 

If some of the particles constitute a rigid body and (8*, 8j, 82) 
are such as not to alter the distances between them, it can be 
shown that d’Alembert’s principle implies that the internal 
forces contribute nothing to S (^8* + YBy + ZBz). We can 
therefore restrict (8*, 83;, 82) to depend only on the transla- 
tions and rotations of rigid bodies and omit the contributions 
from the internal forces. 
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LIGHT AND RELATIVITY 

It did not last : the Devil, howling Ho ! 

Let Einstein be ! restored the status quo, 

J. C. Squire 

9- 1 . We have seen that the truth of the equations of dynamical 
astronomy in the form 

zx _ y,, ~ 

Xi — 3 » 

' Im 

requires that the axes shall have no rotation and that the 
origin shall have a uniform velocity with respect to the cen- 
troid of the universe. We have also supposed that at each 
instant of time each particle of the system has definite co- 
ordinates with respect to these axes. With regard to the time 
some further discussion is necessary. At first astronomers 
thought that the time in these equations was the time of 
observation. But this was found to be incorrect by Romer. 
When the periods of revolution of Jupiter^s satellites were 
found by observation of their eclipses, transits, and occulta- 
tions when Jupiter was near opposition, the results were used 
to predict these phenomena when Jupiter was situated other- 
wise; and errors ranging up to a quarter of an hour were 
found. Romer gave the correct explanation, namely that the 
time in the equations of dynamics is not the time of observa- 
tion, but the time when the event under discussion actually 
happened, and that in a visual observation the time of 
observation is later because it takes light a finite time to 
travel from the object observed to the observer. Jupiter in 
opposition is nearer to us than at other times, and therefore 
light takes a shorter time to travel to us. Events at that time 
therefore do not suffer such a great delay in being observed 
as when Jupiter is at greater distances. The delay corresponds 
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to a time of passage across the earth’s orbit of about i6 
minutes. When the times were corrected to allow for this 
effect the anomalies disappeared. 

Direct measurement of the velocity of light near the earth’s 
surface was carried out by Fizeau and Foucault. The methods 
depend on the principle of sending out an intermittent beam of 
light to a distant object, where it is reflected, and observing the 
time that elapses between the flashes going out and coming 
back. It was found, as expected, that this time was propor- 
tional to the distance traversed. With the best modem methods, 
Michelson has obtained a velocity of 299,796 ± 4 km./sec. 
The experimental determinations agree with that found from 
the observations of Jupiter’s satellites within the uncertainty of 
the latter. 

When the source of light has a velocity of its own, various 
possibilities arise with regard to the effect of this velocity on 
the velocity of light. If light consisted of a stream of cor- 
puscles, it might be expected that the velocity of the source 
would be added vectorially on to the velocity of the emitted 
light. But if the velocity of light is a fundamental physical 
constant we might expect that when light gets away from the 
source it settles down to move with its standard velocity and 
forgets about its source. The velocity of light is so great that 
a small alteration of it, such as the motion of the source can 
introduce, would not affect observations within the solar 
system, but in double stars the effect might be sensible. In 
the eclipsing binary Algol, for instance, the orbital velocity 
appears to be about 240 km./sec., or 0*8 x io“® of the velocity 
of light. The distance of the star is about 35 parsecs or 
10^^ km., so that light from it ordinarily takes 3 x 10® seconds 
to reach us. The effect on the time due to a change of o*8 
parts in a thousand in the velocity of light would therefore be 
2*4 X 10® seconds or 26 days. The whole period of revolution 
of the star is under 3 days. Thus light from the fainter com- 
ponent when it is approaching us would reach us before the 
light that leaves us the next time it begins to recede from us. 
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and the apparent variation with time of the position of the 
secondary would be completely upset*. The secondary, as it 
happens, is not visible separately, being too close to the 
primary, and is known to us principally from its regular 
eclipsing of a portion of the primary’s surface. Application of 
the test to Algol may therefore be impossible ; it is mentioned 
here merely as an illustration. 

9*2. This theory of the velocity of light from a moving source 
has never, as a matter of fact, been taken seriously. We know 
from the phenomenon of interference that light is a wave 
motion. The velocity of waves is a matter of the physical 
properties of the region they are traversing ; once away from 
the source they look after themselves. 

Now consider a moving system consisting of a source of light, 
with a mirror at distance / away from it in the direction of the 
velocity. The source and the mirror are both moving with 
velocity v. Then, on the natural way of looking at the matter, 
light leaving the source has a velocity c and is gaining on the 
mirror with relative velocity c — v. Hence it will overtake 
the mirror in time //(c — v). After reflexion it is moving with 
velocity c again, but towards the source, and has a relative 
velocity c + v. Hence it returns to the source in time l/{c -}- z;), 
and the total time taken is 

/ ^ I _ zcl 

c — v~^ c -V V 

Now suppose that the direction of the mirror from the 
source is at right angles to the velocity of the source. Then 
after the light leaves the source, the mirror and source both 
go on moving with velocity v. If t is the total time of transit 
from the source to the mirror and back the source has mean- 
while travelled a distance vty and if the light on return reaches 
the new position of the source it has a component of velocity 
V in the direction of motion of the latter. Hence its transverse 

* The data are taken, roughly, from Eddington, The Internal Constitu-^ 
lion of the Starsy p. 209. 
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component of velocity is (c* — But the total distance 

zl 

travelled transversely is zl. Hence the time taken is r. 

This is shorter than the time in the former case, in the ratio 
( I — If then we can arrange for the two specimens of light 
to leave the source at the same time and for the distances of the 
two mirrors to be equal, the light that has travelled transversely 
will arrive back first, and therefore in a different phase. If the 
difference of phase is great enough interference will take place. 

This experiment was carried out by Michelson and Morley, 
and has since been repeated by various other investigators. 
The two specimens of light were produced by a mirror 
silvered to semitransparency and inclined at 45° to the 
original beam from a lamp. Half the light went through to 
the mirror in the direction of the velocity of the system ; the 
other half went transversely to the other mirror. On re- 
flexion to the semitransparent mirror, the latter transmitted 
half of one beam and reflected half the other, the resulting 
beams now travelling in the same direction. The distances 
were made nearly equal. The whole apparatus was then 
turned through a right angle, so that the time of transit of one 
beam should be increased and that of the other diminished, 
and if interference did not occur in the first case it should 
occur in the second. 

It was actually found that the rotation of the apparatus 
through a right angle made no difference. If two waves took 
the same time to travel backwards and forwards with one 
setting of the apparatus, they did so again with any other 
setting. The velocity of the system in this experiment was 
the resultant of the velocity of the earth in its orbit and that 
of the sun relative, one supposes, to the centroid of the 
universe. The latter may be supposed constant; the former 
is reversed every six months. It might perhaps happen that 
the two cancelled in one position of the earth with respect 
to the sun ; but actually the result was the same at whatever 
time of the year the experiment was carried out. 
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9-21. The null result of the experiment showed that there 
was something wrong with the premisses. The ratio of the 
time of transmission to the distance was the same whatever 
the orientation of the distance with respect to the velocity of 
the system as a whole. The distances, for this purpose, are 
the distances between the points of reflexion of the light, as 
measured when there is no relative motion, and are therefore 
to be understood as in mensuration. Distance being taken in 
this sense, it appeared that the apparent velocity of light, 
measured as the distance travelled divided by the total time 
of passage there and back, was the same in any direction. 
The expression of this result is that if (x, jy, -sr) are the measur- 
able co-ordinates, with respect to an origin, of the place where 
a light-wave is at time then 



irrespective of the motion of the origin and the direction of 
the axes. If then we take another origin and use analogous 
variables (x\ y\ z\ t') we shall have also 



Either of these forms implies the other. If we write 

ds^ = — dx^ — dy"^ — dz^\ 

ds'^ = c^dt'^ — dx'^ — dy'^ — dz'^y (3) 

then if either of ds and ds' is zero, the other is also zero. It 
follows that for all positions and times 

ds' = kdsy (4) 

where k may be a function of (^, j, Zy t). When {x^y, Zy t) are 
determined, (x', y', z', t') are also determinate, so that the 
variables referred to one system of reference are definite 
functions of those referred to the other. Thus k cannot 
involve the velocities. 

Now suppose that {x, y , Zy t) and (^', y ', z'y t') are both such 


H-2 



164 LIGHT AND RELATIVITY 

systems of reference that the equations of dynamics hold with 
dx d\ dz • 

regard to them. Then if ^ for a particle are constat, 


dz' 


ai ai ai 

the particle is under no forces, and therefore > 

are also constant. Also and are constant. Therefore the 
dt dt 

two sets of equations 

dx dy dz dt 


ds'ds^ds*ds 


constants, 


and 


dx' dy' dz' dt' , , 


( 5 ) 

( 6 ) 


are equivalent. Now consider a particle to move from one 
given position at time /q to another at time t ^ . Then if {x^y^ z) 

rti 

are given as functions of during the transit, J -j^dt has a 


definite value. If we choose a slightly different path, the 
change of this integral is We can show that the con- 
ditions (5) are just the conditions that 8 Ids shall be of the 
second order in the variations of (^,J, z). Thus the equations 
of motion of a particle under no forces are equivalent to the 
statement that 8 ^ds is of the second order. Similarly they 
are equivalent to the statement that 8J^fe' is of the second 
order. Hence if a path is such that 8 jds is of the second order, 
so is h Ik ds. 

It follows that k is constant. For if k depended on {x,y^ z) 
we could take a path near the original one, but always on the 
side of it where k is greater than on the actual path. Then 8 ^ds 
would be of the second order, but 8J^^fe would be of the 
first order. If k depended on ty we could take the same path 
but alter the rate of travel so that t has slightly different 
values for the same (x^y, z). We could arrange for 


{dx^ + dy^ + dz^)jdt’^ 

to be increased when k is large and decreased when k is 
small. Then we get a first order change in ^kds. Thus k must 
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be independent of (^, jv, t). It can be seen that it is unity. 
For if the origin of (x', y\ z\ t') has a velocity with regard to 
that of {x^y, t), we can turn both sets of axes of (^, j', -Sf), 

(x\y\ z') round so that those of x and x' are in the direction 
of this velocity. A bar of length / at right angles to this direc- 
tion in one system has length I in the other system, and as we 
go along it dx = dx' ~ o/dt — df = o, so that 

/ = kl, (7) 

and therefore k = i. (8) 

Thus 


dx^ + dy^ + dz^ — c^dt^ = dx'^ -f dy'^ -f dz'^ — (9) 

Also we can write 


dx' _ dx' _ dx dx' dy dx' dz dx' dt dx' . . 

ds' ds ~~ ds ^x ^ ds dy ds dz ^ ds dt ^ 


with similar equations. Now for a particle in uniform motion 
dx'jds', dxjds and similar expressions are constants, but 
these constants are different for different particles. Hence 

this can be true in general only ^ ^ ^ 

constants. Therefore (x'y y'y z'y t') are linear functions of 


{XyyyZyt), 

Now again take the axes in the direction of relative motion 
of the origin. Then we can arrange the directions of the y' 
and z' axes so that : 


If 

II 

II 

0 

If 

11 

0 

d 

II 

If 

II 

0 

II 

p 


Hence the relations between the co-ordinates are of the form 


= j8 (jc - Vt) 

y'-yy 

z' = 8 z 

t' = OiX -j- piy + YiZ -f- + €. 


(12) 
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Now substitute in (9). We have 

j8* {dx - Vdty + 

— c® {a^dx + ^idy + + ^dty 

= dx* + dy* + dz* — c*dt*. (13) 

On equating coefficients of dxdy, dxdz, dydt, dzdt we have 
Oipi = ttiVi = — yiSj = o. (14) 

Hence either = Sj = o, (15) 

or == = o. (16) 

If (15) is true, the coefficients of dx* and dt* give 

i8® = I ; - p*V* = - c®, (17) 

so that this condition cannot arise unless the relative velocity 
of the origins is the velocity of light. The alternative (16) gives, 
from the coefficients of dy* and dz*, 

y=±i, S=±i; (18) 

and we may agree to take the axes of y' and z' so that the 
positive signs are applicable. From the coefficients of 
dxdty and dt^y 

= I ; pW + = o; 

^ 2 y 2 _ ^ 28^2 _ ^2, (jg) 

Eliminating j8 and V we have 

(i + cW) (V - i) = (20) 

whence 812=14- 0:^2 ^2 ^ ^2^ ^21) 

and from the last of (19) 


If x' and X are to increase in the same direction we must take 
the positive sign. Similarly we shall take Sj positive. Then the 
second of (19) gives 


- PV/c, 


(23) 


and finally 

x'=^Pix-Vty, y = y; = t' = e + i 9 (t-^), (24) 
where jS = (1 — V*/c*)~i, (25) 
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and € is an additive constant depending on the instant we 
measure t' from. If we choose this suitably, € can be dropped. 
If we solve (24) for Xy Zy ty we obtain 

x=P{x' + Vt'); y=y'; z=z'-, t=p(t'-€ + ^^y (26) 

The equations (24) were first given in full by Sir Joseph 
Larmor, who showed that a transformation of this type 
leaves the form of the equations of the electromagnetic field 
unaltered. In the hands of Einstein they became the basis of 
the special theory of relativity. 

It appears from the first of (24) that (F, o, o) is the velocity 
of the origin of the second system with reference to the first, 
and from the first of (26) that (— F, o, o) is the velocity of 
the origin of the first system with reference to the second. 

9*22. Now consider two events specified in the first system 
by {xiyyiy Ziy ti)y (x^y y2y -STg, ^2) in the second by corre- 


sponding letters with accents. Then 

X 2 - = p {X2 - Xi- V(t2- ^i)}, (i) 

y2-yi' ==y2-yif ( 2 ) 

Z2'-Zi'^Z2-Z^y (3) 

^2' ^ P {h ““ ^1 — ^ (^2^ (4) 


Hence distances perpendicular to the direction of relative 
motion are the same in the two systems. 

Suppose the events are simultaneous in the second system, 
so that t2 = ti. Then from (4) 

t2-ti==V {X2 - Xi)lc\ (S) 

and on substitution in (i) 

^ 2 '- V = i3(i- V^lc^){x2-x^) 

= (i-Vyc^)i(x,-x,). (6) 

If then X2 — ^1 is independent of the time, X2 — Xi is constant 
and less than X2 — Xi in a definite ratio. If on the other hand 
t2= tiy we shall find 

x^-x, = (i- FVc*)i (*/ - a:/). 


( 7 ) 
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Now when we measure a distance on a moving object we 
compare the positions of the ends of the object simultaneously 
with those of points with no motion relative to our axes. 
The equality of and is essential to the attribution of any 
meaning to — x^ in terms of distance when the co- 
ordinates themselves are varying with time. If we have an 
object whose length in the x direction in the first system is 
independent of the time in the first system, then its length 
in the second system is less than its length in the first in the 
ratio (i — and conversely. This apparent contraction 

of a moving object in the direction of motion is known as the 
Fitzgerald contraction, and depends, as we see from ( 5 ), on 
the fact that two observers in relative motion differ in their 
ideas of what events are simultaneous on account of the 
finiteness of the velocity of light. 

9*23. We can now identify the time of a distant event in 
terms of light signals. For if a mirror is at a fixed distance /, 
then light takes a time Ijc to reach it, and a further time Ijc 
to return. Thus the time when reflexion occurs is the mean 
of those when the wave leaves the source and returns to it. 
This result is irrespective of the velocity of the system, and is 
not true on the older theory, where the times of the outgoing 
and returning waves were liable to differ, as we saw in dis- 
cussing the Michelson-Morley experiment. 

The chief difficulty usually felt in relation to the modern 
theory of relativity is precisely in connexion with the result 
that events that are simultaneous to one observer are not 
simultaneous to another. But this difficulty arises in reality 
at a much earlier stage than the transformation ( 24 ). If the 
time means the time of observation, then the observations of 
Jupiter’s satellites prove that the equations of dynamics are 
untrue, and conversely, if we are to retain the equations of 
dynamics, the time of an event is not the time of observation. 
We must therefore have a rule to enable us to infer the one 
from the other, of such a character as will keep the equations 
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of dynamics true. The only rule that will satisfy the criteria 
that we have already is (24). The time of observation being 
different from the time of the event, and depending on the 
position of the observer, as a matter of observation, the whole 
conception of simultaneity required rediscussion from the 
start. In any case the times of different observers’ observa- 
tions of the same event differ by quantities of the order of 
the differences of rjcy where r is the distance travelled by the 
light. In the new theory we have obtained a time of the 
event itself, which varies for different observers by quantities 
of the order of Vrjc^, But Vjc is in general small; thus 
the deviations between different observers from agreement 
about time-intervals are of the second order of small 
quantities instead of the first. The objection is in fact a 
straining at the gnat, while swallowing the camel without 
even noticing the existence of the larger animal. 


9-24. Now consider a point moving with velocities (w, w) 
with reference to the system (x, y, Zy t). Then its velocities 
with reference to the system {x\y\ z' y t') are 

, __dx' _ p (dx -Vdt) _ u-V 
^ ^ dt' ^{dt-Vdxjc^) i -uVIc^’ 

, _dy' _ dy _ v , , 

® ~ dt' ~ {dt - Vdxjc^) i3 (i - uVjc^) ’ 

p {i -hVIc^y 

We notice that if (w, Vy w) = {Cy o, o), then (w', v'y w') = 
{Cy Oy o) whatever V may be. If (Uy Vy w) = (o, Cy o), then 
(u'y v'y w') = (— Vy c/^y o) and 

These results are of course particular cases of our fundamental 
rule that the velocity of light is the same however the observer 
is moving. 
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We notice a curious phenomenon if V should happen to 
be greater than c, the velocity of light. Imagine a particle 
moving with velocities (m, w), and consider its velocities 

with respect to an origin moving with velocity V. The quan- 
tity j8 is imaginary if V jo i . Hence and w' are imaginary, 
and the particle could have real co-ordinates at only one 
instant ; for the rest of time its co-ordinates are imaginary — 
which is as much as to say that the particle is imaginary. 
There seems to be no inherent contradiction in the idea of 
velocities greater than that of light : but if we consider as our 
universe all particles moving with velocities less than c with 
respect to ourselves, then any particle with a velocity greater 
than c with respect to ourselves has a velocity greater than c 
with respect to any other particle of our universe. The world 
could then be classified into universes, such that no particle 
in any one universe could be perceptible for more than a 
fleeting instant from a different universe. 

9*25. We now consider other observable consequences of the 
transformation. Consider a source of light sending out waves 
of period 27r/y along the axis of x. Then the disturbance at any 
distance contains a factor such as 

^ ^ sin y (^ — x/c), (i) 

Now consider an observer with a velocity V along the x axis. 
Using (26) we have 

= Jsiny'(<'-^), (2) 

where y' = y^ (i - F/c), (3) 

SO that the period of the disturbance reaching the observer is 
longer than that estimated by a stationary observer in the 
ratio j8 (i — F/r) : i. In practice Vjc is always small and j3 
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indistinguishable from unity. But the factor i — F/c pro- 
duces an apparent lengthening of the wave-length of a given 
spectral line for a receding star, and a shortening of it for an 
approaching one. This is the Doppler effect, and is measur- 
able. It leads to estimates of the radial velocities in double 
stars which agree with those inferred from the transverse 
movements, and has many other astronomical applications. 


9 * 26 , Now suppose that an observer in the t) S3rstem 

sees a star in the direction (/, m, n). Then the velocity com- 
ponents of the light from the star are 

(w, v^w) = — {Ic, mc^ nc), (i) 

To an observer in the {x\y'y z\ t') system the apparent direc- 
tion is (/', m\ n') and velocity components are, by 9-24, 

,, , -Ic -V , , me 

, , nc .V 

--WT IV icy 

Thus m'jn' = tnin, and the directions {I, m, n) and (/', tn', n') 
lie in a plane including the axis of x. If 

l = cosd\ l' = cosd', (3) 


1 + Vile _i-PV I 
‘ ^~i+lVlc ^ I ci + lVjc' 

or, if we neglect the square of V/c, 

= _ ?^tan0. 
c 


( 4 ) 

( 5 ) 


Thus the apparent direction of the star is displaced towards 
the direction of the relative motion of the second observer by 
an amount given by (5). This is the phenomenon of aberration. 
On account of the earth's orbital motion its velocity relative 
to the sun varies in the course of a year, and therefore pro- 
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duces periodic variations in the apparent directions of the 
stars. These are the same for all stars in the same part of the 
sky, and are well known to astronomers. 


9 * 27 . Now consider light emitted from a source and entering 
water moving with velocity V in the direction of the beam. 
The velocity of light relative to the water is where fx is 
the refractive index. The velocity of the source relative to 
the water is — V, Using 9-24 (i) to get the velocity of the 
water relative to the source, we get 


u = 


clfJi+V 

I + (c//x) V/c^ fi 




This is tested in Fizeau’s experiment. Water travels in a 
closed pipe so that when the light is travelling outwards to 
a distant mirror the water is moving with it, and the reflected 
beam travels with the return current, so that the effect of V 
is to increase u' on both the outward and the return journeys. 
Another beam is sent round the other way, so that its ap- 
parent velocity is reduced by the motion of the water. The 
two beams are recombined on return and the difference in 
the times of travel measured by a method of interference. It 
was found in a repetition of the experiment by Michelson 
and Morley that the observed value* of the coefficient of V 
was 0-442 ± 0-02. The value calculated from the refractive 
index fx was 0-438; but this became 0-451 when a refinement 
allowing for dispersion was made. The agreement is within 
the error of observation. A further repetition by Zeeman 
gave almost perfect agreement. 

The result of Fizeau’s experiment is extremely important. 
If the velocity of light in a moving medium was the sum of 
the ordinary velocity and the velocity of the medium, the 
coefficient of V would have been i. If the velocity was in- 
dependent of that of the medium it would have been o. 


* A numerical correction due to Cunningham, Relativity and the 
Electron Theory y 1920, has been used. 
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The experiment excludes both of these alternatives. It shows 
also that the actual coefficient agrees with that calculated 
from (i) ; and the term in if we trace it back, is found to 
come from the uVjc^ in the denominator of 9*24 (i), which 
came in turn from the term in Vxjc^ in the expression for t\ 
Thus it gives a direct check on this term, which is the very 
one in the fundamental transformation that has been most 
subject to dispute. 

9-3. The foregoing theory is usually known as Einstein^s 
special theory of relativity, though the use of the word 
relativity as if it expressed a novel feature is really incorrect. 
The relativity of the equations of dynamics, in the sense that 
they are true whatever unaccelerated non-rotating axes we 
use, had been known since Newton. The need for Einstein’s 
theory arose from two facts about light: first, that it has a 
finite velocity, and second, that this velocity is independent 
of the motion of the observer. If light had travelled with an 
infinite velocity we could have identified the time of the event 
with the time of observation, and there would have been no 
further trouble. But this ceased to be a serious possibility 
when Romer made his discovery about Jupiter’s satellites; 
the time in the equations of dynamics is not the time when 
the observations are made. The modern problem is not the 
discovery of relativity, but to retain relativity without in- 
troducing inconsistency with what we know about light. 
Even with the modification we have made so far, in the rela- 
tions between the co-ordinates and time in different systems 
of reference, the equations of dynamics lose their relativistic 
form. 

9*31 . Consider two bodies moving according to the equations 
niiXi = - = fm^ ^3 , (i) 

and so on. The co-ordinates and time are those of an un- 
accelerated observer. Now imagine another observer with 
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velocities (V, o, o) with reference to the first. Applying the 
transformation of 9-24 we get 

d^x' I d^x , . 

df^ “ jS® (i ~ uVjc^f dt ^ * 

d^' _ I 

dt'^ “ f{i -uVIc^y di^ 

V /dy d^x dx d^\ , . 

^(i~- uV/cy \di dt^’^didi^)' 


with an analogous expression for d^z'jdt'^. 

It is clear from these equations that in the new system 


d^Xi 

li^ 


and 


d^X2 


cannot be in a constant ratio. For in (2), u 


appears explicitly, and is different for the two bodies and 
variable for both. The special theory is not relativistic when 
applied to the equations of dynamics, except for unaccelerated 
particles. 

We may notice, however, that with ordinary velocities 
ds^/c^dt^ is nearly i, and ds' = ds. We might try then to 
modify the equations by replacing djdt by cdjds. Then 


ds' ds^'^ ' '^[ds ^dsj’ ds' ds’ ds' 

* — fl _ 1/ ^ 

dsV’ ds' 


d^x' 

ds' 


dz 
ds' 

d^ d^z' _ d^z 
ds^ ' ds'^ ~ ds^ ' 


But 


ds/ 


(4) 

(5) 

( 6 ) 


2 dt dH _ dx d^x dy d^ dz d^z 

ds ds'^ ds ds^ ds ds^ ds ds^ * 


(7) 


Thus d^x'jds'^ depends not only on the accelerations in the 
original frame of reference but on the velocities, and the 
velocities are variable and different for the two bodies. Hence 
the values of d^x'/ds'^ for the two bodies are still not in a 
constant ratio, and the equations of dynamics do not satisfy 
the principle of relativity. 
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9-4. We saw that the special theory depended on two 
postulates: a particle moving with uniform velocity with 
respect to one system of reference has uniform velocity with 
respect to any other system; and the velocity of light is the 
same in any system. We saw also that these propositions can 
be expressed by saying that ds = ds\ that the path of an unac- 
celerated particle is specified by the equation S /^ = o to the 
first order, and that the path of a light wave is the limit of that 
of a particle when the velocity approaches c. These have a very 
general form. Now we saw that we could put the equations 
of d 5 mamics in a very general form 

8 r'(S (jc2 -f y2 ^ z^) -\-U}dt^ o, (i) 

Jto 

to the first order. For particles under no forces [/ = 0 . But 

= c J{i - i {P + + o (c-^)} dt ( 2 ) 

Now if we do not vary the values of {x^ y, Zy t) at the limits 
the first term of ( 2 ) is just c and its variation is zero. 

The second term, apart from a constant factor, leads to an 
equation of the same form as (i) takes for an unaccelerated 
particle. This strongly suggests that there is an analogy 
between Hamilton's principle and the stationary property of 
Ids. If in fact we consider 

I |s {Pdi^ — dx^ — ^ ^ ^2) . 

we have an integral that behaves in the proper way when U 
is zero, and yields Hamilton's principle as an approximation, 
with errors of order c-^y when U is variable. Alternatively, 
if we introduce Fj, the gravitation potential at the particle 
nti , we have 
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since SjWi Vi takes each pair of particles twice. Then 
{c^dti^ — dxi^ — dyi^ — dzi^ — zVidti^)^ 




( 5 ) 

( 6 ) 
(7) 


SO that if we redefine dsi by the equation 

dsi^ = (c^-^ 2 Vi) dt^ — dx^ — dy^ — dz^, 
we can sum up our present knowledge in the form 

to the first order. l^idsi — o, 

Clearly an infinite number of such hypotheses would 
satisfy our present data equally well ; for all that is necessary 
is that the integrand should reduce, at a great distance from 
attracting bodies, to the ds of the special theory, and that near 
attracting bodies the second approximation should be of the 
form j ^ 

cSm, 'ji -{■ 2V)\ dt. (8) 




Any terms of order {x^ -f ^ z^yjc^ or could be in- 
cluded in the coefficient of dt without disturbing our present 
knowledge. 

At this stage there are two possible lines of progress. One 
is that actually adopted by Einstein, which led to the general 
theory of relativity. We notice that the Newtonian equations 
have a form that is unaffected by a uniform velocity of the 
origin, while the properties of light and freely moving par- 
ticles at a distance from matter can be put in a simple form 
depending only on the ds of the special theory, which is in- 
dependent of the choice of origin. This property, that the 
fundamental equations are independent of the velocity of the 
origin (and of course the directions of the axes, so long as they 
are not rotating), is of a very simple and general character, 
and therefore has a moderate prior probability. Its verifica- 
tion to a considerable order of accuracy by the phenomena 
considered in the special theory and by the laws of Newtonian 
dynamics therefore establishes a high probability that it is 
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true exactly and in general. We may therefore take it as a 
fundamental postulate and develop its consequences. If 
these turn out to be verified its probability will approach 
certainty. 

The other line of attack is to begin with the observed 
phenomena and find what is the simplest law that fits them. 
It is found that this law has relativistic properties, and affords 
justification for trying to push the principle of relativity 
still further. 

9-41. Starting from our recognition in the special theory of 
the fundamental importance of ds^ we see from ( 7 ) that there 
is a possibility of retaining it in a gravitational field, provided 
we modify the coefficients slightly. But if ds is to have such 
an importance it must be the same for all observers, and it is 
easy to see that this casts our whole scheme of Cartesian co- 
ordinates and time into the melting-pot. Imagining the 
coefficients to have been modified suitably, we must suppose, 
as our obvious generalization from the special theory applic- 
able in the absence of a gravitational field, that the motion 
of particles in a gravitational field is such that ^ds is stationary 
for small variations in the path, and that the path of a light 
wave is the limit of the path of a particle when it is such that 
ds = o between any two consecutive points on the path. But 
in that case, since gravity appears explicitly in ds^ light is 
affected by gravity, light rays may be curved in a gravitational 
field, and our test of collinearity among distant objects breaks 
down; our laws relating distances then become approxima- 
tions and do not hold exactly. We might try to save them by 
saying that in a gravitational field the ds suitable for light still 
has constant coefficients, so that light still travels in straight 
lines, but that the form suitable for material particles does 
involve the gravitational field. But at the present time this 
possibility is hardly worth discussing, because we do know 
that light rays are curved in a gravitational field, and there is 
no justification for trying to treat light and material particles 


JSI 


12 
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independently. The properties of distance, as exact relations, 
have already been seen to need some modification when there 
is relative motion in the system. For an observer may find 
by trial with measuring rods that two distances along his x and 
y axes are equal ; but to an observer moving along the first 
observer’s x axis these distances will appear different. The con- 
cept of distance has a definite meaning only in the absence of 
relative motion. But ds retains a definite value even when 
there is relative motion. What we are still entitled to say, then, 
is that with reference to any observer the position of a particle 
at any instant can always be specified by three variables 
Xiy X2i x^y the instant itself being specified by a fourth time- 
like variable x^ . Then we may say that an event is specified by 
the four variables XiyX2yXQyX^. If two events happen at neigh- 
bouring places at a short interval of time, we can say that ds^ 
is a quadratic function of the changes of the four variables, 
the coefficients being functions of the variables. We write then 

= gndxi^ + g 22 dx 2 ^ + gzsdx^^ + gudx^^ + 2 gi<idx^dx 2 + ... 

+ ^gzadx^dx^ (i) 

= gijdXydXj {iyj = 1, 2, 3, 4), (2) 

where the g's are to be determined. In (2) we use the sum- 
mation convention of tensor calculus, that where a suffix 
such as i or j is repeated it is to be given all its possible values 
in turn and the results added up. By symmetry we can take 

gn^gH- (3) 

In the absence of gravitation we can take ix ^ , 1X2 , ix^ to be the 
Cartesian co-ordinates, and x^ to be ct. Then 

^'ll=^22=^33=i?44= I, ( 4 ) 

g}^ — giz — — gzn — O- (5) 

In presence of gravitation the ^’s will be modified. We have 
one consideration from Newtonian dynamics to guide us. 
The departure of the velocity of a particle from constancy 
depends on the first space derivatives of the gravitation 
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potential V. Far away from matter, these are zero, and the 
particle moves in a straight line. If they are constant, all 
particles have the same acceleration. Near other matter, 
these derivatives are not zero or constant; but a function 
formed from their variations from place to place, namely 




( 6 ) 


is still zero outside matter, but finite inside it. Now it appears 
from 9*4 (6) that the variable parts of the are likely to 
reduce approximately to multiples of F. Also if the gif were 
constants the relation S fds ~ o would imply uniform velocity. 
Thus the existence of accelerations still depends on variations 
of the gify as before on variations of V; and we look for a set 
of second order differential equations that may hold outside 
matter, corresponding to (6). 

Einstein’s procedure is to notice that the condition that a 
particle shall move with uniform velocity is equivalent to the 
condition that ds^ can, by a transformation of co-ordinates, 
be put in form such that (4) and (5) hold. With some forms 
of the original g^j this is possible; with others it is not. When 
it is possible the field is called Galilean, and the special theory 
of relativity applies. The condition that it may be possible is 
that a certain fourth order tensor B^ifi depending on the g^f 
and their first and second derivatives with regard to the co- 
ordinates shall be zero. This on the face of it has 256 com- 
ponents, but on account of various symmetry relations only 
20 are actually independent. The vanishing of all components 
of this tensor is the condition for the absence of a gravitational 
field. Einstein looks then for a set of equations formed from 
them that may persist in the neighbourhood of matter, but 
outside it, and a suitable set is found to be 

Gn = = o, (7) 


where in accordance with the summation convention / is 
given all the values i, 2, 3, 4 and the results added. Then Gff 
is a tensor of the second order, with the same number of 


12-2 
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components as the original and its vanishing gives the 
requisite number of differential equations for the latter. It 
can be shown that if vanishes in one set of co-ordinates 
it does so in all, so that we can write down these equations 
in any co-ordinates we may choose. Inside matter is not 
zero. When the field is not varying with the time, that is, if 
all derivatives of the with regard to are zero, G^ = o is 
found to be equivalent to V^g^ = o, apart from a term in- 
volving Within matter, by analogy with Newton’s law, 
we may therefore say that <? 44 . like VWy is proportional to 
the density. The three components G14, G24, G34 are related 
to the momentum per unit volume, and the six ^11 > ^12 > • • • G^88 
to the six components of stress that occur in the theory of 
elasticity. 

The solution of the equations has actually been carried out 
completely in only one case, that where the field is sym- 
metrical. In the case of the sun, for instance, we may im- 
agine the time to be that of an observer on the sun, and the 
direction of a particle specified by the usual angular co- 
ordinates 6 and cf). Another co-ordinate is needed to give the 
distance from the sun. Now if we imagine a short rod placed 
at right angles to the radius from the sun, it subtends a small 
angle, difs say, at the sun. Its length being da, we say that the 
distance r is to be given as dajdifj. Then for such small dis- 
placements as make dr and dt zero, we define r by 


^ ^ ds^^^r^ (dd^ + sin2 ed<t>\ 

and in general ' ' 

ds^ = g^ (r) dr^ — {dd^ 4 - sin^ Bd(f>^) + ^44 (r) dt^. 


( 8 ) 

(9) 


for by symmetry and ^44 must be functions of r only. 
Einstein proceeds to obtain the G,^, and finds that they can 
vanish only if 


(r) == - (i - 2/»i/cV)-i; 

^44 = W = c^{i- 2fmlch), (10) 


where m is seen, on comparison with Newton’s theory, to be 
identical with the mass of the sun. 
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It is found that with this form of ds^ the paths of the planets 
still agree with those found from the Newtonian law within 
the errors of observation, with one exception. The new law 
is found to imply that the path of a planet is not exactly an 
ellipse, but a slowly revolving ellipse, the direction of the 
major axis turning round at a constant rate. This change is 
inappreciable by observation except for the planet Mercury, 
which was known to have an outstanding departure from the 
Newtonian theory of just this character; and the amount 
found from Einstein’s theory agreed closely with that already 
known to exist. 

The form of a ray of light near the sun was found to be 
curved, so that stars would not be seen in quite their usual 
directions if the light from them to the earth passed near the 
sun on the way. The amount of the deflexion was calculated, 
and the amount observed at the total eclipse of the sun in 
1919 and at several later eclipses agrees with it. 

9 * 42 . The theory is therefore well supported by observation, 
and the general principle that the paths of particles and light 
are determined by the behaviour of ds^ subject to the coeffi- 
cients satisfying relations of the form Gij == o outside matter, 
is in a strong position. But the other point of view is not 
exhausted. It can be asked, and often still is, whether any 
other law than Einstein’s will account for the perihelion shift 
of Mercury and the displacement of star images. This ques- 
tion is habitually ignored in ordinary expositions of the 
theory of relativity, but it is of capital importance. It is well 
known that there is matter within the orbit of Mercury, some 
forming the solar corona and some reflecting the zodiacal 
light. Such matter is qualitatively capable of accounting for 
the perihelion movement of Mercury by its attraction, and 
for the displacement of star images by its refraction. Indeed 
before Einstein’s theory theories were in existence that 
appeared to account for the anomaly in the movement of 
Mercury by the attraction of the zodiacal matter, and also for 
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an anomaly that exists in the motion of the plane of the orbit 
of Venus’*^, The latter is of course not touched by Einstein 
theory, but it is not very much larger than the probable error, 
and might just possibly be due to error of observation. If 
then matter existed in such quantity as to account for any 
important fraction of the anomaly in the motion of Mercury 
or of the displacement of star images, the remainder would 
not be in accordance with Einstein's theory, which would 
therefore be false. But its amount can actually be estimated 
from the amount of light that it reflects, and it can be shownf 
to be much too small to account for any appreciable fraction 
of the observed effects. These must therefore be due to a 
departure of the law of gravitation from that of Newton. 

The next question is whether, given that the excess motion 
of the perihelion of Mercury and the displacement of star 
images are of gravitational origin, any other law than Ein- 
stein's would account for them. An answer to this question 
also can be given. If we return to 9*41 (10) and assume (r) 
and ^4 (r) expanded in series of powers of i/r, thus: 

(r) = I + (i) 

^4 W = I + ^4^"^ + (2) 

then the equation 8 = o is equivalent to 

^\%dt = h^Ldt = o, (3) 

where L^ = -gi (r) P - { 0 ^ + sin^ + c^g^ (r). (4) 

This leads by the methods of the calculus of variations to 
d /dL\ dL . 

and two similar equations in 6 and <f>. We see easily that the 0 
equation is satisfied if 0 = permanently. The equation has 
a first integral ^ . 

= constant. (6) 

* Jeffreys, M.N.R.A.S. 77 , 1916, 112-118. 
t Ibid. 80 , 1919, 138-154. 
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There is another first integral, 

. dL A dL . dL r / ^ 

which leads to ,, . . /ox 

g^jL = constant. (8) 

Then (6) and (8) together, with sin 0 = i , give 

r^fg^ = constant = h. (9) 

We can use this to eliminate the time from (8); we get on 

r - (.0) 

/du\^ A . ^ 


(S)-’ 




where A is another constant. 
This is equivalent to 


M + i 


dgi (du 


d / 1 


©■ 


^ ^ du \d<i>) 2 du \gj * 

In the actual motion of a planet u is nearly constant. We put 

/w = i + (13) 

where | is small and has mean value zero. Then (12) gives 
to the first order in | 

<- 4 ) 


, ,^d^^ . , /c* rd* /i\l i 

cs) 

Now for planets more distant than Mercury h^jl is always the 
same, and can be denoted by the fm of Newton’s theory. 

Thus “ is sensibly constant for I greater than the mean 

distance of Mercury. Thus it is equivalent to its first term, 
— . Therefore 

2^4 = — zh^jlc^ == -* zfmjc^. (16) 
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Then dividing (15) by (14) we have 






rd* /: 

i\' 

rfwni 

'4/ 

— (- 


_du \g 

4)] 


u^\ll 


that is, 


(17) 


= o. 

(18) 


For planets more distant than Mercury ^ is of the form 
e cos — a), where e and a are constant for each planet. If 
in general ^ is of the form e cos — a) we have 

(■ + 7* + 1 ° (-9) 

For I large />* = i, as it should be. For Mercury, 

/)* = I — 6 fmlc^l, (20) 

by observation. Substituting for and in (19) and 
equating coefficients of ijl we have 


4 _ 2B4 _ 2 fm 

A, c* • 


(21) 


Now consider a light wave coming from an infinite distance. 
Then L = o, since dls = o for two neighbouring positions of 
a light wave, and therefore in (ii), A = o. Also if at a great 
distance the velocity is c along a line passing at distance a 
from the centre of the sun. 



A = — = — ac- 

L J r-00 

(22) 

Thus 

(du\^ , , 1 

(23) 


If (f> = o when w = o (r = cx)) and we neglect the differences 
of and ^4 from unity, a solution is 

au == sin 


(24) 
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Then as r decreases from co to a and increases to infinity 
again <j> increases steadily from o to tt. 

According to (23) 

.'0 (i — 

since u has to increase to its maximum and decrease to zero 
again. Put 


(25) 


Then 


augi 


i. 


X. 


(26) 


[<!>] 




= 77 + 


A, -A, 


+ 0 


m- 


(28) 


Thus <l> increases by more than tt during the passage ; the ray 
has a curvature towards the sun. The observed deflexion is 
4/i»/c*o, whence (2,) 

and therefore (30) 

From (21) now = o. (31) 

It follows that 


ds^ = c^\i-^ + 0 

chr 


fzfni\^ 


dr^ 


Einstein’s solution was 


— gij^2 (32) 


ds^ = c* (i - df>-(i- {dd^ + sin® 0#*), 

(33) 

SO that all the terms in it capable of producing a perceptible 
elfect are directly demonstrated by the observational data. 


9 ' 5 . It might appear that as Einstein’s law of gravitation was 
obtained as a result of his considerations of the general re- 
lativity of the laws of nature, and afterwards verified by 
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observation, the last discussion is of the nature of a prophecy 
after the event. I think, however, that it was really rather in 
the nature of an accident that Einstein’s law was obtained by 
his method and not by one very like that just given. The 
motion of the perihelion of Mercury had been known since 
Leverrier’s theory of the planetary motions, and it was known 
that a slight modification of Newton’s law of gravitation 
would account for it. The only one actually suggested was that 
of Asaph Hall, in which the index in the law was made slightly 
different from 2. If the simplicity postulate had been ex- 
plicitly stated at that time it would have been recognized that 
a law with an index slightly different from an integer is in 
reality an extraordinarily complicated one, and has therefore 
so small a prior probability that it does not merit serious con- 
sideration. The alternative that should have been tried was 
to include in the gravitational force terms varying inversely as 
the third and fourth powers of the distance, and to choose 
the coefficients so as to account for the facts. It would then 
have been found that the ratio of the coefficient of the third 
power to that of the second was of the order of and a 

direct relation between gravitation and light would have been 
indicated. Such a relation had been tentatively suggested by 
Newton himself and by Laplace. The curvature of light rays 
passing near the sun had indeed been predicted by Newton. 
His suggestion had been forgotten; but a discovery of this 
sort would certainly have revived interest in it and led to an 
experimental test. It would then have been found that the 
deflexion was twice what he predicted, and it would have been 
seen that a more drastic revision of Newton’s law was neces- 
sary than the mere addition of a cube term to the gravitational 
acceleration. In that case every essential of Einstein’s law 
would have been obtained before his theory was created, and 
his result would have been merely a mathematical description 
of facts already known. Of course the fact that Einstein was 
able to construct his theory without such previous con- 
siderations is an additional reason for admiring Einstein. 
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It might be said that in inferring the law of gravitation 
from the empirical facts we have gratuitously assumed that 
the departures from the Newtonian law arise from the terms 
of the lowest orders in the g^s that are not considered in the 
first approximation. But if they arose from later terms extra- 
ordinarily large numerical coefficients would be needed, which 
again are excluded by the simplicity postulate. 

9-6. There is no antagonism between the principle of rela- 
tivity and the simplicity postulate; indeed the principle is 
itself a thoroughgoing application of the postulate. The 
simplest possible relation between two quantities is that they 
are independent; that is, that when one changes there is no 
associated change in the other. When there is an associated 
change we may either study it directly, or try to construct a 
new quantity that does not change. In Newtonian dynamics, 
for instance, the velocities of the bodies in a system change 
with time. We can proceed to find new quantities, the 
momenta of the system, which do not change with time. We 
may deal with the kinetic energy either by saying that 

change of kinetic energy = work done, 
or we can reverse the sign of the work and introduce an 
apparently new concept, the potential energy, and say that 

kinetic energy -{- potential energy == constant. 

Actually this procedure is of doubtful legitimacy, because 
the work done may depend on the mode of passage from the 
initial to the final state, in the case of non-conservative forces, 
and then the existence of potential energy as a function of the 
state of the system is problematical. It is found, however, that 
the operation of non-conservative forces involves the genera- 
tion of heat, and that there is a relation between the work 
done by these forces and the amount of heat produced. A new 
kind of energy, heat energy, is then invented, and we say that 

kinetic energy + potential energy of conservative forces 

+ heat energy = constant. 
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So we proceed, inventing new kinds of energy so as to keep 
the principle of conservation of energy true. It is actually 
found at each stage that the new kinds of energy have definite 
properties of their own that warrant their being regarded as 
physical Concepts. In modem molecular physics heat energy 
itself has come to be regarded as kinetic and potential energy 
of agitation of the molecules, thus making it possible to 
regard all forces in the last resort as conservative; the dis- 
tinction is then between large-scale or molar motion that we 
can observe in dynamical experiments, and small-scale or 
molecular motion that is not directly observable as motion, 
but can be detected either by the sensation of heat or by the 
production of thermal expansion. In all stages we detect the 
operation of a prior probability that something is constant, 
and that our problem is to find out what it is. 

But the conservation of energy and momentum does not 
comprise the whole of dynamics. To find the actual relative 
motion of the parts of a system we still need the equations 
of motion, or their equivalent, Hamilton’s principle; and 
these imply the conservation of energy and momentum but 
are not implied by them. We prefer Hamilton’s principle to 
the equations of motion because it is expressible directly in 
terms of the ultimate ideas of distance and time, whereas the 
equations of motion, apparently at least, involve the co- 
ordinate system. Hamilton’s principle is not a conservation 
principle ; the essence of it is that the integral involved in it is 
not stationary if the path we begin with is anything but a 
dynamically possible path. But we do notice in it that the 
statement is independent of the system of co-ordinates ; and it 
is this fact, applicable to the whole well-verified region of 
Newtonian dynamics, that is the basis of the belief that the 
ultimate laws of nature can be stated in a form independent 
of the co-ordinate system. By the simplicity postulate we are 
therefore entitled to say with a high probability that this prin- 
ciple is true in general. But this principle is precisely the 
principle of general relativity. The special theory of relativity 
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enables us to extend this to the motion of light, without dis- 
turbing it for material particles outside of a gravitational 
field, only an ultimate physical constant, the free velocity of 
light, appearing. But in the special theory it turns out that 
dynamical time is on a very similar footing to the three posi- 
tion co-ordinates, and the principle of relativity has to be 
extended to allow the system of reference to include four 
variables, three position co-ordinates and the time, which 
can then be transformed freely for systems of reference in 
motion relative to each other. Incidentally it turns out that 
there are not two fundamental ideas, distance and time, but 
one fundamental idea, ds\ and that the coefficients in ds^ 
in the general theory satisfy differential equations that have 
the same form in any transformation, just as the gravitation 
potential of Newtonian dynamics satisfies a differential equa- 
tion that is not affected by changes of axes. 

It must be said that in spite of the high probability we can 
now attach to the principle of general relativity, it is still on 
its trial. It is really verified so far only in the motion of a 
body or light ray of negligible mass in a symmetrical field. 
The obvious generalization to a system of many particles 
would be to attribute a mass and a ds to each, and to say 
that S iniidsi is stationary. But a deeper analysis appears to 
be necessary, and the application of the principle to even the 
problem of two bodies has not yet been carried out, on 
account of the mathematical difficulties. 

A further problem concerns the size of the universe. The 
coefficient outside a spherical body, we have seen, is 
c^{i — 2fmlch). If the universe has density p and radius «,then 
just outside it will be (i — ^^Trfpd^jc^), This is negative 
if pd^ exceeds a certain value, and the corresponding local 
velocity of light is imaginary. There is thus a definite upper 
limit to the size that the universe can have if its mean density 
is given; and there is a lower limit to its size if its total mass 
is given. Various solutions of the problem have been attempt- 
ed, but there is so far no definite answer. 
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The chief outstanding problem, however, is in relation to 
electricity and magnetism. It was shown by Lorentz and 
Larmor that the equations satisfied by the electric and mag- 
netic forces satisfy the special theory of relativity, and 
numerous electrical experiments designed to detect absolute 
velocity led to null results. But light waves are electro- 
magnetic in character, and are now known to be affected by 
gravitation. Thus gravitation and electromagnetic phenomena 
interact, and the question is, do the laws of this interaction 
satisfy the general theory of relativity? The question really 
presupposes a condition analogous to the classical “First 
catch your hare ”. We cannot test these laws experimentally 
until they have been produced, and although several experts 
have produced theories they do not seem to have satisfied 
one another. 

The general theory of relativity is therefore justified as a 
physical law up to a certain point, and the simplicity postulate 
entitles us to extend it further, if possible. This extension is 
a matter for the future and for further experimental in- 
vestigation. 
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“How is bread made?’* 

“ I know thatV' Alice cried eagerly. “You take some flour ” 

“Where do you pick the flower?” the White Queen asked. “In a 
garden, or in the hedges?” 

“Well, it 'v&n\ picked at all”, Alice explained; “it’s ground — — ” 

“How many acres of ground?” said the White Queen. “You mustn’t 
leave out so many things.” 

Lewis Carroll, Through the Looking Glass 

This chapter is devoted to a number of incidental considera- 
tions that have so far escaped attention. 

10‘1. Is there a non-quantitative simplicity postulate? Let us 
consider such a biological proposition as the following. 
All animals with feathers have beaks, two legs, two wings, and 
warm blood. 

We might try to analyse this as a proposition in sampling. 
“Being an animal with feathers’’ would then be the property 
a of a class, m members of which have been observed* 
“Having a beak” is taken as the property b possessed by / of 
the observed members. Then according to Laplace’s theory 
of sampling the probability that the next member examined 

has the property b is 5 ^ind if all the observed members 

have had the property, so that / = m, the probability that the 

whole class, of number n say, has the property 6 is ^ ^ \ 

This result, in relation to the proposition under considera- 
tion, seems to be at variance with general opinion. I have 
observed a large number of animals with feathers, but I 
suppose that they constitute less than i in 10,000 of the 
animals with feathers in England. According to Laplace’s 
theory, then, {m + i)/(w -f i) is under the prob- 
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ability that I should attribute, on the data, to the proposition 
that all such animals in England have beaks does not exceed 
this trivial fraction. Actually I seem to attribute to it a prob- 
ability approaching certainty. The same is, I believe, the 
position of most ornithologists. Our problem is to consider 
the reason for this great departure from Laplace’s theory. 

It might be thought that since “animal with feathers” is 
so widely recognized a concept as to have had the name bird 
associated with it, and as such to have been mentioned in 
literature on countless occasions, the information provided 
by other people contributes largely to one’s estimate of the 
probability. If such a concept is generally recognized, a 
“bird” without a beak would attract attention and be com- 
mented on, and the absence of comment gives some ground 
for supposing that nobody has seen such a thing. In this 
particular proposition such considerations certainly carry 
weight ; but I do not consider them the ultimate reason. In 
the first place, other people’s judgments are not known to me 
directly ; the things that I know directly about other people are 
their appearance and the sensations of sound that they produce, 
and the appearance of the marks they make on paper. When 
I attribute to their sounds and writings meanings similar to 
those I express when I make similar sounds and marks myself, 
I am making an inference. It seems to me that such an in- 
ference must rest again on observed similarities between other 
people’s behaviour and my own, which are generalized as 
part of the science of psychology, and depend for their 
acceptance as general propositions on a theory of sampling. 
I have not examined a large fraction of the inhabitants of 
England to find out whether they do seem to attribute the 
same meanings to propositions that I do, and when I assume 
that those I have not examined do so I am making an infer- 
ence, which on Laplace’s theory would itself have as small a 
probability as the proposition about birds. The introduction 
of testimony therefore only shifts the issue without affecting 
its ultimate nature. 
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The situation occurs, indeed, when no question of testi- 
mony arises. A botanist finds a plant that does not fit any 
description already recorded. He immediately calls it a new 
speciesy and publishes a description of it ; that is, a new con- 
cept is created on the basis of a single observed instance. 
There is no question of anybody else having observed the 
same species. But we notice that one property does not make 
a species. If a botanist found, for instance, a plant agreeing 
in all particulars with the descriptions of the upright butter- 
cup, but possessing no petals, he would not publish a de- 
scription of a new species. Ranunculus apetalus. He would 
call it a specimen of Ranunculus acer without petals*. The 
mere possession of one unusual property does not constitute 
a new species, but merely a freak. There must be a conjunc- 
tion of several new properties, and then it is expected that 
some at any rate of these will always be associated in future 
instances. It is utility for purposes of prediction here, as in 
quantitative laws, that coincides with the introduction of a 
new concept. The principle seems to be that if an object of a 
given class has r properties a^byCy ... ky then there is a finite 
prior probability that all future members of the class with 
any r — i of these properties will also have the remaining one. 
This probability is a moderate number, independent of n 
the number of members of the class with r — i of the pro- 
perties in the world. If it was merely i/w, we should be back 
to Laplace’s theory ; and we seem to have reached again the 
principle that Laplace’s assessment of the prior probability 
is wrong for the extreme cases where all or none of the 
members in the world have the property under discussion. 
But if the prior probability is moderate, say whatever n is, 
it appears that repeated verifications will make the probability 
of the law approach certainty, as for quantitative laws. Then 
we have a simplicity postulate applicable to non-quantitative 
laws. 

* Unless he forgot his acety acrisy acre and called it Ranunculus 
acris. 
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We may hazard a solution of this question by considering 
the prior probability that a point may lie within a given 
interval on a line. If the line is infinite in length both ways, 
SO that there is nothing to distinguish one interval from 
.another, the prior probability is uniformly distributed and 
the probability that the point lies in a given interval is pro- 
portional to the length of the interval. Strictly this makes the 
probability that it may lie in any finite interval infinitesimal, 
but we need consider only the ratios of the probabilities for 
different intervals, which are perfectly definite; and when 
measures are introduced factors such as enter into 

the inverse probability and make the posterior probabilities 
definite. If the point is restricted only to lie to the right of a 
given origin on the line, and we have no previous knowledge 
about its distance x from that origin, the prior probability 
that it lies in a short interval dx is proportional to dxjx ; for 
with any other law the probability that it lies between x and 
2 x would depend on x^ and therefore there would be a 
previous criterion suggesting a scale of distance. Now suppose 
4iat we know initially that x lies between o and i. The prior 
probability that it lies in a range dx must be symmetrically 
distributed about the centre of the range, so that it must be 
of the form/ {x {i — a?)} dx. But when x is small the influence 
of variable distance from i must be inappreciable, and there- 
fore when X is small 

f {x{i — ^)} oc i/x. 

But in this region f {x {i — x)} is nearly / (^), and the required 
law is therefore that the prior probability that is in a range 
dx is proportional to dxjx (i — x); and integrating this we see 
that the prior probability that it lies between a and b {b> a) 

is proportional to log ^ ^ ^ — log • The fact that this 

tends to infinity when b tends to i or a to o is not really 
serious, because in actual measures the extreme values are 
usually excluded by the limits of error. Now this suggests a 
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form of the prior probability in the theory of sampling. We 
are given there that a ratio x = r/n is at least o and at most i. 
Then the prior probability of a given value of r may be taken 
as proportional to ^ ^2 

;x:(i — A!r) r(n — r)' 

Thus the / (r) of the theory of sampling should be taken to be 
inversely proportional to r (n — r). This makes / (r) tend to 
infinity at the extreme values ; but as before this is not serious, 
for so long as the sample is homogeneous the extreme values 
are still admissible, and we do attach a high probability to the 
proposition that the whole class is of one type ; while as soon 
as any exceptions are known the extreme values are com- 
pletely excluded and no infinity arises. Such a form of / (r) 
seems therefore to be just what is needed to provide a 
simplicity postulate for non- quantitative laws. 

It may happen that an observed new conjunction of 
properties breaks down in further instances. This is precisely 
the case where the botanist cannot find permanently associated 
characters to describe his species properly, and occurs in such 
‘‘ difficult genera as Rosa and Hieracium, From our point of 
view these are instances of suggested laws, with finite prior 
probabilities, that have broken down under crucial tests. 

It appears that such a principle is of great importance in 
the theory of our knowledge of the world, and that the validity 
of even the concept of objects itself depends on it. 

If we return to the notion of a bird now, we see that 
feather really expresses in itself the conjunction of numerous 
properties. A feather has a central horny quill, fringed by 
numerous filamentous hairs so arranged as to lie side by 
side nearly in a plane, and so that their ends lie on a smooth 
curve. It is this conjunction of properties that justifies the 
introduction of the concept and the attachment of a definite 
name to it. Similarly beak implies a horny projection on the 
face, carrying with it a mouth and nostrils; again several 
properties are associated. The observed usual conjunction of 
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the two sets of properties, and also those of two legs, two 
wings, and warm blood, warrants the inference that the con- 
junction holds in general and therefore the introduction of 
the concept bird. 

It happens that, so far as our knowledge goes, all animals 
with feathers have the other properties mentioned ; the con- 
verse is not true. Thus the duck-billed platypus has a beak 
and warm blood, but has not feathers and has four legs and 
no wings ; man has two legs and warm blood, but not a beak 
or feathers. What if there were no single defining property; 
if, that is, there were animals with a beak, two legs, two wings, 
and warm blood, but covered with hair instead of feathers.? 
Should we then have to abandon the notion of bird} I think 
not. We should call the new creatures birds with hair, just 
as we call the duck-billed platypus a mammal with a beak ; or 
else we should retain the definition in terms of feathers and 
deny that the new creatures are birds at all. There would 
probably be a vigorous discussion in the zoological journals 
as to which course was the correct one, but in any case the 
decision is a matter of convention, like the assignment of a 
name to the concept in the first instance. The important thing 
is the observed usual conjunction of the properties, upon 
which we base the inference that the properties are likely to 
be associated in future instances. The existence of an occa- 
sional exception does not disprove the rule ; it merely suggests 
new lines of inquiry. The concept is merely a way of express- 
ing the rule concisely. 

After the above passage had been written I came upon the 
following, in a paper by Dr A. Wohlgemuth*. 

“The point has been admirably stated by Freud’s col- 
league, Joseph Breuer: 

“‘All too easily one gets into the habit of thought of as- 
suming behind a substantive a substance, of gradually under- 
standing by consciousness an entity. If then one has got used 
to employing local relations metaphorically as, e.g., subcon- 

* J. Medical Psychology ^ 5 , 1925, 105. 
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scious^ as time goes on an idea will gradually develop in 
which the metaphor has been forgotten, and which is as 
easily manipulated as a material thing. Then mythology is 
complete.* 

“Breuer recognized the slippery slope down which Freud 
rushed away from scientific fact, and called a warning halt, 
but, alas, too late.” 

With the statement of psychological fact that we do get 
into the habit of assuming a substance, or, as I prefer to say, 
a concept, behind observed conjunctions of properties, and 
that the concept comes to be manipulated as directly and 
easily as a material thing, I am in complete agreement. From 
the statements of opinion by the two authors quoted that this 
occurs too easily and that it constitutes a means of rushing 
away from scientific fact, I dissent completely. These opinions 
are a direct negation of the whole of the scientific procedure 
of constructing concepts, and there would be no such thing 
as science, or, indeed, as everyday knowledge, if they were 
accepted. It is precisely the utility of concepts in sum- 
marizing existing knowledge that makes it possible to keep 
scientific facts classified and accessible, and therefore to make 
progress as new laws are discovered. The existence of dy- 
namics, for instance, depends first on abstracting the concepts 
of physical objects and events from sensations; then on the 
concepts of intervals of time and distance, derived from events 
and objects; then on those of mass and force, derived from 
the observed relations between distances at different times. 
At each stage the concept gets further away from the original 
facts ; but at each stage also it makes it possible to infer more 
facts. The double aspect of the construction of concepts is 
not antagonistic to scientific method, but on the contrary is 
the very essence of it. 

10*2. Ultimate concepts. In the development of knowledge 
our fundamental data are sensations and certain a priori prin- 
ciples of logic and probability, and as we proceed we construct 
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concepts of increasing generality from them. Is there any 
reason to suppose that the process will ever stop? If so, the 
concepts reached at this final stage may be called ultimate 
concepts. It has happened that for ages certain concepts 
have been thought ultimate, but are now proving to be express- 
ible in terms of more general ones. Thus distance and time, 
which were long thought to be ultimate and absolutely general, 
are found to be approximations involving a certain amount of 
ambiguity, the more general concept behind them being the 
ds of the theory of relativity. The physical object itself, with 
its characteristic dynamical property of impenetrability, is no 
longer a continuous piece of matter’^ occupying a definite 
region of space. It lost that status when Dalton showed that 
the simple numerical relations that arise in the laws of 
chemical combination could be explained if matter consisted 
of molecules, each molecule of a definite substance consisting 
of a finite number of atoms, the total number of kinds of 
atoms being finite. It followed at once that a piece of a 
chemical compound could not be, in the last resort, a region 
of space with the same properties at all points. The molecular 
theory led further, in the hands of Waterston, Boltzmann, 
Maxwell, and others, to mechanical explanations of Boyle’s 
and Charles’s laws in gases, and the viscosity, diffusion, and 
thermal conductivity of gases. In modern physics, experi- 
ment in rarefied gases reaches directly not only the molecule, 
but the atom; and even the atom proves to have properties 
explicable on the hypothesis that it is made up of only two 
kinds of entity, the proton, positive nucleus or hydrogen ion, 
carrying a positive charge, and the electron, carrying an equal 
negative charge. Application of this notion of the ultimate 
constitution of matter to solid crystals has led W. H. and 
W. L. Bragg to explanations of their behaviour in reflecting 
X-rays, and Max Born and J. E. Lennard- Jones to quantitative 
explanations of their elastic, optical, and electrical properties. 
The principle that matter is made up of protons and electrons 
is therefore in a strong position. But the impenetrability of 
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matter has lost its generality. In a gas under ordinary con- 
ditions the region actually occupied by the molecules is under 
a thousandth of that of the whole ; even in a solid the protons 
and electrons do not occupy more than an exiguous fraction 
of the whole region within the apparent outer surface. We 
cannot as a matter of fact push one piece of matter through 
another without meeting a resistance; but this resistance is 
explained by the theory. Further, electrons can be made to 
pass right through films or plates of solids, finding their way 
between the constituent protons and electrons. 

These modern views on the constitution of matter did not 
lead directly to the abandonment of the idea of a physical 
object as an ultimate reality, but rather to the attitude that 
the object, as usually understood, is composed of smaller 
things, which are still objects; that is, like the physical object 
before the time of Dalton, they have definite positions at any 
time, and no two of them can occupy the same region. But 
even this position is being assailed by the new quantum 
mechanics. According to Heisenberg’s uncertainty principle, 
which is a simple consequence of any quantum theory, it is 
never possible to measure the position and velocity of a 
particle accurately at the same time ; whichever of them we try 
to measure, the process affects the other, and an indeterminacy 
remains in both. Relativity has left us thinking that an event 
can be specified by stating exact values of four variables, three 
position co-ordinates and the time. Heisenberg leaves us in 
doubt as to whether these variables can have any exact values 
at all ; and if the position of a particle is indefinite it becomes 
doubtful whether the statement that two particles cannot be 
in the same position has any meaning. 

In the various forms of the new quantum mechanics the 
four variables needed to specify the time and the position of 
any particle have ceased to be physical magnitudes at all ; a 
single numerical measure is not enough to specify any one of 
them. In Heisenberg’s theory each is replaced by a matrix, 
an assemblage of several magnitudes; in that of Dirac the 
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co-ordinates and the corresponding momenta are what he 
calls ^-numbers, which do not satisfy the ordinary rule of 
multiplication pq = qp. In the theory of Schrodinger an 
entirely new variable, the wave-function appears, which 
satisfies a certain differential equation, and the observed 
phenomena of electron emission, radiation, and so on emerge 
as expressions of the properties of the wave function. All 
three theories appear to give the same answers, and to be well 
confirmed by experiment. But all agree in that the ultimate 
particles do not have definite co-ordinates at any instant. 
The proton and the electron, as particles with definite posi- 
tions, have disappeared. Whereas on the older quantum theory 
a hydrogen atom consisted of one proton with one electron 
moving in a definite orbit about it, on the new theories the 
proton and electron have lost their individuality and can only 
be said each to fill the whole region occupied by the atom. 
As we cannot observe the positions of the electron at various 
points of its alleged orbit, and should certainly alter the 
orbit if we tried, there is no experimental objection to this 
view. It is only when the electron emerges from an atom 
and travels freely that it behaves as an individual, and 
in these conditions the new theories describe its actual 
behaviour. 

On Schrddinger’s theory the co-ordinates appear explicitly 
in the differential equation, though the electron as a thing 
with definite co-ordinates has disappeared. Thus the notion 
of position in space remains though nothing has a definite 
position. This situation is somewhat paradoxical, and an 
attempt has been made to overcome it by constructing from 
^ a real function, which is said to represent the probability 
that a given position is occupied at a given time. Thus we 
have to speak in our ordinary sense of the probability of 
Schrodinger’s differential equation as a scientific law, and 
yet the equation itself deals with probability. We are in the 
position of having to speak of the probability of a law of 
probability. The complication is not really a new one, because 
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it arises in the treatment of the law of error when the standard 
error has to be determined from the observations. In that 
case we have had to speak of the prior probabilities of dif- 
ferent standard errors, that is, of different laws of error, where 
each law is itself a statement about the distribution of prob- 
ability among different possible values of the error. Perhaps, 
in addition to the a priori laws of probability that underlie all 
inference, there are other laws of probability that have to be 
found as far as possible from experience and therefore have 
probabilities themselves. 

The question does not arise in the theories of Heisenberg 
and Dirac, for the co-ordinates in them are not single real 
magnitudes. The position has just the same degree of in- 
definiteness as the particle that is said to occupy it’*'. This 
consideration, combined with the formal simplicity of Dirac’s 
theory, seems to place it in the best position of the three. 
But the formal simplicity of Dirac’s laws does not always 
make it easy to solve his equations in special cases, and it is 
often found that the solution of his equations is most easily 
obtained by Schrddinger’s method. This is really because 
Schrodinger’s method uses only ordinary mathematics, while 
Dirac’s numbers that do not satisfy the commutative law of 
multiplication require the construction of a new branch of 
mathematics, which is not yet fully carried out. 

The existence of three such theories, all giving results in 
agreement with the facts, but formally quite different, leaves 
us in considerable doubt about ultimate concepts. A fruitful 
source of philosophical discussion is the reality of scientific 
concepts. So far as I can see what is usually meant by this 
is the existence or otherwise of atoms, electrons, light waves, 
and so on as ultimate realities in the same sense as physical 
objects appeared to be ultimate realities to the eighteenth- 
century physicist. The answer to this seems to be definitely 
in the negative; but the question is replaced by that of the 

* The indefiniteness is much like that in the statement that the equation 
(jc— 3 )®-f- =0 has roots near »=3, though actually there is no real root. 
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reality of co-ordinates, momenta, and wave-functions. It 
seems to me that this question may well be postponed till 
we have made more progress with the various new quantum 
theories, particularly in the direction of co-ordinating them 
with the general theory of relativity. In any case the concepts 
that appear explicitly in the theories are quite different in 
character from physical objects. From the standpoint of 
scientific method the one and only test of the validity of con- 
cepts is whether the laws they are supposed to satisfy explain 
our sensations ; whether this is also a ground for attributing 
philosophical reality to them is a different question. 

10-21. The question of ultimate concepts arises again in 
such biological questions as the materialistic interpretation of 
physiology and the physiological interpretation of psycho- 
logy. Modern research has shown that many physiological 
processes satisfy quantitative laws like those of physics 
and chemistry, and in many cases that these processes can 
actually be interpreted in terms of physics and chemistry. 
Are we justified in inferring that all physiology is reducible 
to physics and chemistry? It must be remembered that when 
the question was formulated the atom was considered an 
ultimate reality ; the result of modern developments in physics 
is that we are asking whether physiological processes can be 
explained in terms of ^-numbers or (/r-functions. The alter- 
native is that there is a non-physical concept, which we call 
life, and which may be ultimate. The problem of materialism 
is to explain life. Life as it stands is a valid scientific concept 
because it explains observed phenomena; a live animal has 
different properties from a dead one. That is not to say that 
it is an ultimate concept. There seem to me to be two relevant 
indications, pointing in opposite directions. The growth of 
green plants involves the interaction of carbon dioxide and 
water to produce sugar or starch and oxygen, a reaction re- 
quiring the absorption of energy, which the plant obtains 
from the sun’s radiation. Carbon dioxide and water are 
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ordinarily stable in each other’s presence; the plant must 
apparently have some directing ability, applying the solar 
energy in just such a way as to upset this stability. The same 
applies to the obscure organisms that derive their energy 
from chemical reactions without the presence of light, re- 
actions that do not take place spontaneously, but only under 
the influence of the plant itself. On the other hand, if or- 
ganisms have a directing power, of molecular fineness, as 
this would suggest, they might apply it to the sorting out of 
molecules according to their velocities. Then they could 
upset the second law of thermodynamics and provide for 
themselves all the available energy they need. This does 
not appear to happen; physiological processes in animals 
and plants appear to follow the second law of thermodyna- 
mics. The hypothesis that life is not an ultimate concept 
remains untested. 

10*22. Our primary data being sensations, it may be said 
that the aim of science is to account for sensations in terms 
of ultimate concepts and their properties. On the materialist 
theory these ultimate concepts are those of physics and no 
others. The physiological interpretation of psychology does 
not go so far as this, but states that psychological phenomena 
can or will be reduced to physiology. The experimental study 
of sensation has gone some way in the explanation of the 
transmission of sensations to the brain, but little has been done 
towards understanding what happens to them when they get 
there. The opinion that the amazing complexity of mental 
processes, including recognition of sensations, emotions, 
reasoning, and volition, can be reduced to physiological pro- 
cesses, is hypothesis ; it may be true or not, but it is certainly 
at present pure unverified hypothesis. Further, all the 
mental processes just mentioned have in common the fact 
that they involve, to varying degrees, conscious criticism. 
This is directly recognized and therefore is a fundamental 
concept. One way of studying it is to examine mental 
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behaviour when it is removed as far as possible, and to see 
what differences arise. 

It is therefore a legitimate procedure to study mental be- 
haviour when conscious criticism is, as far as possible, 
eliminated. The absence of criticism is best realized in dreams 
and in the psychoanalytic situation, where the patient, as a 
regular matter of technique, says everything that comes into 
his mind without criticism. The results are not chaotic ; they 
are found to arrange themselves according to perfectly 
definite rules of resemblance, which are scientific laws. They 
differ from the rules of conscious criticism, the function of 
which is to observe and study them ; and they are found to 
be closely related to the forgotten experiences of childhood 
and the pitiless logic, based on incomplete data, of the enfant 
terrible and the child at still earlier ages. The result is the 
discovery of a whole region of mental activity, with laws 
of its own, and demanding new concepts to express them. 
Freud’s Unconscious is the general name for this region ; for 
details of its structure reference must be made to the special 
literature of the subject*. 

The results of psychoanalysis have been criticized on 
various grounds, which seem to me to merit discussion here 
because they involve points of principle applicable to any 
science. One line of attack is simply to deny the facts as dis- 
covered, or the truth of the relations found between them. 
This is merely a matter of refusal to investigate, and does not 
impress the analyst who is dealing with the material every 
day, or the patient who has been cured of various mental 
disorders, ranging from minor anxieties to phobias or dis- 
abling neuroses, by being enabled to understand his own 
mental processes better, 

A more subtle attack is to say that psychological processes 
are really the expressions of physiological ones, and that the 
solution of the problems investigated must come ultimately 
from physiology. This may be true. But to use it as a basis 

* See especially Freud, The Ego and the Id. 
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of procedure is not legitimate, because it assumes from the 
start that there are no ultimate mental concepts, or, what is 
the same thing, it takes for granted that there are relations 
that completely determine the phenomena of conscious mental 
activity in terms of those of physiology before we know what 
they are. Instead of inferring the laws from the data, the 
invariable scientific procedure, it begins with unstated laws 
and treats the data as a ground for optimism about the future. 
The situation is the same as if an engineer in process of 
designing a bridge was told that he should not attend to 
experimental evidence about the strength of his materials 
because all phenomena of elastic fatigue, like other elastic 
phenomena, may some day be explained in terms of modern 
atomic and quantum theory. It may be so ; but he wants to 
get the bridge built. 

It has also been said that the phenomena are not quanti- 
tative and therefore not scientific. This consideration would 
obviously invalidate the greater part of biology; but it would 
also apparently invalidate the notion of the physical object 
itself. Quantitative study always rests on a basis of facts 
recognized qualitatively, and the fact that we cannot as yet 
measure emotions quantitatively and predict their measures 
is no ground for saying that emotions do not exist when we 
know perfectly well that they do, or that they obey no 
scientific laws when considerable knowledge of those laws 
has in fact been attained. 

A further consideration is that even if such a hypothesis is 
correct we should still be under an obligation to investigate 
whether its consequences are true. That implies investigating 
mental phenomena, and providing explanations of the facts 
that psychoanalysis has already disclosed. The hypothesis 
saves no work, but merely attempts to delay it. 

10*23. The criterion of philosophical reality has been put in 
the form “do things exist when they are not observed?^’ 
From our point of view this is scarcely a question at all. Our 
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primary data are sensations, which definitely do not exist 
when they are not observed, and a priori laws, which have 
the property of truth whether we know them or not. The 
reality of concepts, on the other hand, is not explicitly in- 
volved in the question. To ask whether a physical object 
exists when it is not observed assumes that it sometimes is 
observed, and this is untrue. The physical object exists only 
in the sense that it helps to explain sensations; it is never 
observed directly. To say that ‘‘we observe an object’’ is 
really a shorthand for saying that we have a series of sensations 
which are co-ordinated by forming the concept of an object. 

In another form, however, the question is significant. We 
observe the direction of the planet Jupiter at various times 
and predict its position at other times. We also observe a 
minor planet and predict its position at any time, allowing for 
the attraction of Jupiter on it in the meantime. The results 
are verified irrespective of whether we actually do measure 
the position of Jupiter in the meantime. It is of course well 
known that Neptune and the companion of Sirius were dis- 
covered through their perturbations of Uranus and Sirius 
respectively ; their gravitational effect was known before they 
had been seen at all. Our most direct reason for saying that 
Jupiter or Neptune exists is that we can see it if we take the 
proper steps ; but the motion of other bodies due to it is the 
same whether we actually observe Jupiter or Neptune in the 
meantime or not. There is no reason in principle for choosing 
the direct visual sensation rather than the perturbative effect 
as our ground for forming the concept of Jupiter or Neptune. 
The two grounds express co-ordinations of different sets of 
sensations, that is all. If a concept is formed as a result of 
one law, and subsequently a second law is found to be true 
in terms of it, we may often just as well take the second as 
the definition of the concept ; and the two together express 
a greater generality of the concept than is implied by either 
alone. In this case, by visual observations, we infer the law of 
gravitation, giving certain co-ordinates, which we say express 
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the positions of the planets at any time with a very high degree 
of probability. These co-ordinates exist at intermediate times 
because we can calculate them, and when observations are 
made the inferred values are found correct ; no further justi- 
fication is necessary. 

We nevertheless need to allow occasionally for the possi« 
bility that certain events may occur only when opportunity 
arises for observing them. Thus a traveller observing the 
United States from the train alone might be pardoned for in- 
ferring a law* that a bell is always ringing at railway crossings. 
What he observes is that this law holds when his train is near 
a railway crossing; he has no opportunity of observing that 
the bell rings only when a train is near. On the face of it this 
is a case of error introduced by the nature of the observing 
instrument, but so extreme as to be trivial. Yet it is quite 
analogous to a stage in the development of modern physics. 
At the time of the Michelson and Morley experiment 
physicists generally believed in an all-pervading ether, which 
transmitted electric waves, including light waves. The experi- 
ment, like many others, was designed to detect motion relative 
to this ether. The failure to obtain any positive result led to 
the opinion that, though there must be an ether, whenever 
we tried to detect motion with respect to it circumstances 
conspired to make it impossible to do so. Physicists holding 
this view were effectively saying that the observer was always 
on the train, however hard he might try to get off it, but were 
nevertheless clinging to the view that there were times when 
no train was near and that it was reasonable to speculate about 
observations in such conditions. Einstein’s great advance in 
1905 was to recognize from the weight of evidence that a stage 
had been reached when too many conspiracies of circum- 
stances had to be assumed, and that it was better to take the 
known facts as they stood and generalize from them. 

In the last resort we can never exclude this type of diffi- 
culty entirely. The existence of sensations implies the exist- 

• As usual, I mean a scientific law, not a legal one. 
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ence of an observer, and there is therefore always a theo- 
retical possibility that his presence produces effects that do 
not exist otherwise. The practical reasons for ignoring this 
possibility are, first, that the presence of another observer 
does not as a rule alter the observations made by the first, 
which we should expect to happen if the observer had a 
disturbing influence; and second, that we do as a matter of 
fact proceed by describing and inferring sensations, and that 
the state of the world when not observed is not really relevant. 
But it is relevant that our laws lead to correct inferences 
whether or not we have in each experiment checked every 
intermediate step. When the constant of a tangent galvano- 
meter has been determined in terms of the rate of deposition 
of copper in an electrolytic cell, it is unnecessary to re- 
determine it during every later experiment with that galvano- 
meter. The scientific law being once established, subsequent 
inferences from it are made with the full probability of the 
law, and repeated verification is not needed. 

The influence of the observing conditions is seen again 
in the difference between experiment and observation. In 
a laboratory experiment there is usually a possibility of re- 
petition ; a result having once arisen, the apparatus can be set 
up again and we can see whether the same result ensues. If 
we have a prior belief in determinism we should expect 
this. Personally I do not think that a belief in determinism 
is a priori. What I think is established by such a repetition is 
that the result is independent of the time of the experiment. 
In astronomy, on the other hand, we cannot start the planets 
off again as they were and see whether they again describe 
the same orbits. This possibility of control over the initial 
conditions constitutes the difference between experiment and 
observation. It is a difference of technique, and not of prin- 
ciple. In the astronomical case it is equally well established 
that the accelerations are determined by the relative positions 
of the bodies and do not involve the time explicitly. If we 
could control the initial conditions, it might have been estab- 
lished with less trouble ; but it is established. 
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Such judgments of independence are much commoner in 
scientific inference than are ordinarily realized. In describing 
the result of an investigation we tend to restrict our speci- 
fication to the variables actually found to be relevant. In an 
electrical experiment we do not usually specify the time of 
day, the temperature outside the laboratory, the observer’s age, 
or the number of the laboratory assistant’s children. The reason 
for this is not a guess or a prior certainty that these factors are 
irrelevant. The reason is that in different experiments these 
factors are actually different, and the results are found to be 
the same. Now independence is the simplest possible scien- 
tific law, corresponding to the simplest differential equation 

^=0 

dx 

Hence it always has a considerable prior probability, and 
therefore reaches practical certainty with a very small number 
of verifications. We expect things to be independent until 
the contrary is shown ; our interest is in discovering relevant 
variables, not in adding to the enormous number of irrelevant 
ones. 

The belief in determinism is related, I believe, to what 
philosophers call the Principle of Causality. It may be ex- 
pressed in the form : given the state of the world at any instant, 
the state at any subsequent instant is determinate. The truth 
of the principle requires some discussion of the meaning of 
state. The positions of all particles in the world at some instant 
obviously do not determine the motion afterwards unless we 
also know their velocities. But particles are not enough. The 
time must be the time of the event as understood in the theory 
of relativity. If a light is extinguished at an instant, it is still 
seen for a finite time at distant places ; the sensations produced 
are the same as if the light was still shining. If we consider 
the state of the system at an intermediate time, we must say 
that the illumination seen is caused by the light on the way^ 
for the lamp is no longer available as a cause. The state to be 
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Specified as determining the future must therefore include 
the positions and directions of motion of light waves. The 
alternative is to say that the state of a system at any instant is 
determined, not by the state at each single previous instant, 
but by the aggregate of states at all previous instants. The 
position is tenable ; but now we see that the previous instants 
to be considered stretch right up to the instant of observation, 
and we may reasonably say that the state then is determined 
by the states at intervals indefinitely shortly before. But then 
the notion of light on the way becomes a necessity, and we may 
as well say at once that the law of causality is expressed by 
differential equations with regard to the time. If we insist on 
specifying the state only in terms of material particles we must 
consider laws as involving finite intervals of distance and 
time explicitly, and we meet the ancient question of action 
at a distance. It seems to me that the answer to this question 
can be given in terms of the principle that the form of 
quantitative laws is differential. The form of the properties of 
light away from a gravitational field is given by Maxwell’s 
differential equations. The fact that light has a constant 
velocity is a property of the solution of these equations. Thus 
the fact that two events at the same time at different places 
do not influence each other is a result explained by the law 
and scientifically valuable as helping to establish the law. It 
is at this point, I think, that Robb’s theory of conical order 
of events has its application. Again, the fundamental form 
of the law of gravitation is 

= - 47r/p 

on Newton’s theory, or its analogue on the general theory of 
relativity ; the integrated form 

r 

is not the fundamental form of the law, but its solution. Action 
at a distance seems to imply that the latter should be con- 
sidered fundamental, and this course, I think, is wrong. 
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The denial of action at a distance in this sense does not 
carry with it the acceptance of the notion of an ether. The 
latter concept was effectively that of an elastic solid capable 
of transmitting transverse waves with a constant velocity, and 
has broken down under later work. But the ideas of position 
co-ordinates and time, and of the electric and magnetic forces 
associated with them, arise of themselves, quite independently 
of the assumption of a quasi-material substance filling space. 
Our knowledge of electromagnetic phenomena indicates that 
they are related by differential equations, which in turn imply 
and explain the properties of light. The question of an ether 
does not arise. 

The principle of causality now becomes the aggregate of 
all scientific laws, whether already known or awaiting dis- 
covery. To accept it implies a hope that we may some day 
know all laws ; but that day is still distant. As a working rule 
it may be valuable for its psychological effect, but there is so 
far no definite reason for believing it true, and science can 
get on quite well without it. 

The words cause^ and because are on a different 

footing, and have nothing to do with a general principle of 
causality. If a scientific law involves a number of variables, 
then a knowledge of all but one of them determines that one. 
We say that it has a certain value because the others have 
certain values. The notions of cause and effect involve rather 
more than this; there is an asymmetry about them that is 
absent from the word because. Thus we may say either that a 
triangle has the angles at the base equal because it is isosceles, 
or that it is isosceles because the angles at the base are equal. 
When we speak of a cause and an effect, we pick out the one 
as the cause and the other as the effect, and they cannot be 
interchanged. The distinction seems to be one of time; the 
events under discussion are connected by a scientific law, 
and we pick out the earlier and call it the cause, and the later 
the effect. There is no distinction of cause and effect for con- 
temporaneous events. The definition of simultaneity on the 


14-2 
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principle of relativity makes it possible to generalize this. 
We have seen that two events that are simultaneous for one 
observer are not necessarily simultaneous for another ; but if 
two events are specified by position co-ordinates and time, 
{Xy y, Zy t)y {x'y ^y z'y t') fot any observer, and we consider the 
quantity 

r2 c^(j; — t'Y — {x — x'Y — {y — yy — {z — z'Yy 

then has the same value for all observers in the same 
universe. If it is positive, we can say that the one event is 
before or after the other, and it is possible for a message 
travelling from one place with a velocity less than that of 
light to reach the other place in time \ t — t' \. If is the 
greater we say that the event {x\ y\ z\ t') is the later of the 
two. If 1*2 is negative, a message would have to travel with 
a velocity greater than that of light, and no such velocity is 
known in physics. Then we say that neither event is before 
or after the other, and in fact we can find velocities of the 
observer that would make them simultaneous. Thus events 
are arranged in what Robb calls a conical order in terms of 
the invariant relations of before and after y where we say now 
that one event is before or after another if it is before or after 
it to all observers. Then we can say that if there is a law con- 
necting two events, the earlier in this sense is the cause and 
the other the effect, and this is a definition that applies to all 
systems of reference. If two events are connected, but neither 
is before or after the other, we may use the word becausey but 
we cannot say that either is the cause and the other the effect. 
In such cases we can, of course, usually trace both to some 
cause earlier than either. 

Related to the question of repetition is the case where the 
thing under observation is destroyed by the act of observa- 
tion. Thus when we analyse a chemical compound the final 
products are not the same as the original compound ; when 
we observe light it is absorbed in the eye and not re-emitted ; 
and when a neurosis is psychoanalysed the patient recognizes 
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the relation of the symptom to his early conflicts, which are 
no longer of practical importance to him, and the neurosis 
disappears. There is again no difficulty in principle when we 
recognize that our data are sensations. The chemical com- 
pound, light, and the neurosis are all concepts designed to 
explain sensations, and there is no difficulty about supposing 
that the concepts cease to exist when the sensations they were 
designed to explain no longer exist. We do need, of course, 
to recognize the memory of previous sensations among our 
data. 

10 * 3 . Some reference may be made here to the practice in 
mathematical physics of “neglecting small quantities” and 
arguing by “orders of magnitude”. Both methods are 
almost universally accepted by physicists; both are looked 
upon somewhat askance by pure mathematicians; and both 
are completely unintelligible to the man in the street, to 
whom the journalistic expression “mathematical accuracy” 
implies an entirely erroneous idea of what mathematics 
means. Thus the problem of “squaring the circle” still has 
its devotees ; some of the uninitiated try to solve it by methods 
that are known to be incapable of solving it, and others repeat 
the legend that the problem is still unsolved. I remember 
once seeing a claim in a popular science journal that, though 
7T has been evaluated to a large number of decimal places, all 
such estimates are wrong because they are not exact; the 
author proceeded to prove to his own satisfaction that it was 
exactly equal to 3*125, This is an extreme case; but there was 
apparently a publisher willing to pay for printing the article, 
and presumably there was a public willing to buy the journal. 
We need not take this proposition seriously; we need only 
notice that it has an emotional value expressible in terms of 
hard cash. The apparent precision of the number 3*125 was 
the attraction ; the fact that tt is known to be between 3*14159 
and 3*14160 was considered irrelevant. On a somewhat 
higher level, we have the candidate in the Mathematical 



214 MISCELLANEOUS QUESTIONS 

Tripos who attempts to do a problem in small oscillations 
without neglecting the squares of small quantities till the 
very end. He never gets the right answer (even if he gets an 
answer at all), because he always makes a mistake in algebra. 
We have also the candidate who can do a complicated 
factorization but cannot prove the simplest inequality. In 
this we must detect an inherent tendency to trust the word 
equaly but a suspicion of greater than and less thaUy which is 
scarcely exceeded by that directed against approximately equal. 

In the sense understood by the man in the street, exactness 
has almost disappeared from the subject-matter of modem 
pure mathematics. It survives in projective geometry, which 
is really the study of sets of algebraic equations, and in the 
identification of high prime numbers. Modern analysis deals 
with infinite series and the behaviour of integrals and 
differential coefficients, all of which involve the notion of a 
limit. Thus the sum of an infinite series whose ;^th term is 
is defined as the limit, if any exists, of the sum of the first 
n terms when n becomes indefinitely large. The criterion that 
the sum may be S is that, if we choose any positive quantity e, 
however small, we can find a value of n such that for all values 
of m greater than «, the sum of the first m terms differs from 
S by less than e. While the sum S appears as an exact value, 
it is the result of a limiting process, which depends essentially 
on a recognition of the meanings of greater than and less than. 
The solutions of most of the differential equations of physics 
are expressible as series possessing sums so defined, and their 
numerical values can be found by actually computing the 
series, term by term, till the desired accuracy is obtained. 

The physicist often looks at the summation of series from a 
different standpoint. He is not interested in the fact that the 
sum of n terms of the series has a definite limit when n 
becomes indefinitely large ; he has not the slightest intention 
of computing more than a certain finite number of terms. The 
existence of the function associated with the series is usually 
known already, and what the physicist needs is to evaluate 
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it with the requisite degree of accuracy without a prohibitive 
amount of labour. The mathematical property of convergence 
is neither a necessary nor a sufficient condition for physical 
utility. Thus if we consider the exponential series 


^ = I + ^ + 


-+ +^"+ 


and put jc = — looo, we have a series that converges in the 
mathematical sense. But the terms increase numerically up 
to n= looo, and the actual computation would be hopeless. 
Actually, of course, one writes 

^- 1000 ^ j j 

and computes logio e from some such formula as 

i/logio e = loge 10 = 3 (log, f + log, f) + log, f . 

Convergence is therefore not a sufficient condition for utility. 
Nor is it necessary, for if we consider the series 

(i — erf x) = TT'-i x-^ + — “ x-^ — ji?-’ + . . .V 

the series on the right is always divergent. But it can be 
shown that the sum of the first n terms always differs from 
the function on the left by less than the last term retained. 
If X is large, the terms decrease to a minimum, and the smallest 
may be within the range of accuracy required. Thus for 
x= 2 y fourth term is about -4^^ of the first. Such series 
are called asymptotic, and have been extensively studied in 
modem pure mathematics. But whereas the tendency of the 
pure mathematician is to consider convergence as the generally 
important property, and the asymptotic property as a make- 
shift, physical utility makes the asymptotic property valuable 
and convergence unimportant. But the real test for physical 
utility is that the sum of the first n terms (where n is pre- 
assigned, usually does not exceed 10, and often is i), shall 
diflFer from the function represented by less than the limits of 
error permitted by other considerations. If this condition is 
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satisfied it is no concern of the physicist’s how the later terms 
of the series behave. If it is not satisfied he will have recourse 
to numerical solution of the differential equation. 

The “neglect of small terms’* in a differential equation 
implies that the solution is in error to some extent, which 
will depend on the actual magnitude of these terms. What is 
certain is that the solution will remain approximate through- 
out a certain range of the independent variable ; the smaller 
these terms are, the longer the range will have to be before 
their integrals become large enough to affect the accuracy of 
the approximation to a given extent. In some cases we can 
prove rigorously that they will never do so: in others we 
cannot. It seems to me that the general theory of the degree 
of accuracy of these approximations is an important and 
almost unworked field of pure mathematics. The physicist’s 
method is to solve the problem first by neglecting them, and 
to substitute the result in the small terms to verify that they 
do remain small. 

The use of “orders of magnitude” is a further departure 
from popular standards of accuracy. It usually consists 
essentially in the principle that if x varies from a to by a 
function / (a?) varies from / (a) to / (6), and we can replace 

its derivative /' (:v) by its mean value 

is continuous this is true for some value of x between a and b ; 
but the method goes further. If we have a differential equa- 
tion we may carry out operations of this type on both sides 
of it and reduce the equation to a single algebraic equation. 
The result is necessarily inaccurate. Its utility is essentially 
in carrying out preliminary tests on a theory. If we get an 
agreement within a numerical factor of 5 or so we may say 
that the theory is worth closer examination ; if the two sides 
of the equation so obtained differ by a factor of 1000 or more, 
we consider that further investigation is unnecessary. There 
may be physical grounds, in a particular problem, for contrast- 
ing two hypotheses directly, and then the method of orders of 
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magnitude will enable us to reject one and retain the other 
without the trouble of carrying out an accurate investigation. 

These methods hardly arise, I think, in the establishment of 
a physical law. They are concerned with the investigation of 
the competence of different causes to produce a given 
effect, the laws being already known. 

The term “order of magnitude’’, in the physical sense, 
means rather more than it does in modem pure mathematics. 
Thus the pure mathematician may write an equation 

/(:v) = ^ (x) + O {x% 

where / (;x;) and ^ (jc) are two known functions of Xy and he 
will say that their difference is of the order of magnitude 
of x^. He means that when x tends to zero, the ratio 
{f{x) — <f) {x)}/x^ tends to a finite limit or zero. The limit may 
be 1000. A physicist in such a case would not say that 
f (x) — (f) (^) is of the same order of magnitude as x^y for he 
probably wants the actual values of the functions when x is 
different from zero, and if the limit is a large number the 
utility of the approximation may be vitiated. The physicist’s 
meaning is more restricted in one way, though less precisely 
defined in another. He may say that two quantities are of the 
same order of magnitude when there is no question of a limit ; 
thus the masses of Jupiter and Saturn are of the same order 
of magnitude. Two quantities may be said to be of the same 
order when their ratio does not exceed 10; and the justifica- 
tion of the method is that the ratio of the mean values actually 
compared in the reduced equation is really a numerical 
constant arising in the solution, and that in practice such 
constants hardly ever do exceed 10. Exceptions sometimes 
arise : thus the condition that turbulence may persist in the flow 
of a fluid in a pipe involves a numerical constant of the order 
of 1000, but that is really because the solution of the problem 
involves not one equation, but a family of four differential 
equations, three of the second order and one of the first. 
Thus there may be such numerical constants as 7 ! or 5040. 



CHAPTER XI 


OTHER THEORIES OF 
SCIENTIFIC KNOWLEDGE 

I have seen all the works that are done under the sun; and, behold, all 
is vanity and vexation of spirit. Eccles. i. 14 

A preliminary explanation is needed before entering on the 
topics of this chapter. The theories considered here are 
selected on account of their relation to the general aim of this 
book, which is to systematize the processes actually employed 
in the acquisition of knowledge by experience. They have in 
common, in my opinion, the feature that if they were accepted 
as practical rules of working they would make this acquisition 
impossible. In some cases they were expressed by their 
authors some time ago, and I am not in all cases in a position 
to know whether the respective authors still hold the views 
in question. For my purpose, however, it is the theories 
themselves that matter, rather than the personal question of 
whether the individual authors still hold them ; for in fact each 
theory still certainly has a number of professed adherents. 

11*1. The statistical theory of probability. In the present 
work probability is regarded throughout as a property of the 
relations between propositions. Like such notions as force, 
interval of time, distance, electric current, colour, pitch of 
sound, and so on, it is immediately recognizable by conscious- 
ness in suitable circumstances. Like them also its treatment 
can be made quantitative, and its specification can thereby 
be made enormously more precise. The original meaning, 
however, is never lost. If we ignore it we are deliberately 
neglecting a piece of knowledge that we have, and are there- 
fore restricting the application of scientific method. This is 
more serious in the case of probability than with the other 



OTHER THEORIES OF SCIENTIFIC KNOWLEDGE 2ig 

concepts mentioned, because it is not the subject-matter of 
a branch of science ; science is a branch of the subject-matter 
of probability. To ignore probability is to ignore the problem 
of scientific inference and to deprive science of its chief 
reason for existence. 

Many writers, following the late John Venn*, have at- 
tempted to avoid the notion of probability as a primitive con- 
cept by trying to define it in terms of the composition of 
samples. Venn considered that the notion of probability pre- 
supposes a series, the terms of which are indefinitely numerous 
and represent the cases of an attribute From these one can 
pick out a smaller class, the members of which possess the 
further attribute If then we have chosen m members in 
all, and / of them belong to the smaller class, the probability 
of if/ given </> is defined as the limit of l/m when m becomes in- 
definitely great. The form of this definition restricts the field 
of probability very considerably. As a matter of simple fact, 
when we speak of probability we do not consider an inde- 
finitely large number of trials. In many cases, such as when 
we speak of the probability that the solar system was formed 
by the disruptive approach of two suns, or that the stellar 
universe is symmetrical, the idea of even one repetition is out 
of the question. Yet these are precisely the cases where the 
notion of probability is most valuable. 

But actually Venn’s definition suffers from a drawback that 
deprives it of all application whatever. If a definition is to 
be of any use it must imply a test, and we must be able to 
carry out that test. On the a priori view, when we say that 
the probability that a penny will come down heads is we 
make an immediate judgment. On Venn’s view we must 
throw it an infinite number of times and take the limit of the 
ratio of the number of heads to that of all throws, and nobody 
has had, or ever will have, time to do it. There is no case 
where the value of the probability, on Venn’s definition, is 
known, or even where it is known to exist. 

• Logic of Chanccy pp. 162 et seqq. 
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We must remember that this view is designed to avoid 
the need to treat probability as an undefined concept with 
a priori laws of its own. The undefined concept view gives a 
justification for the opinion that a large sample will probably 
be approximately a fair one; if we reject this view we also 
reject the justification that it gives, and must be prepared to 
find a new one. The question at issue is whether, apart from 
the a priori view, there is any reason to believe that the ratio 
considered in Venn’s view tends to any limit whatever. To 
say that it does is essentially an assertion about the result of 
an experiment that nobody has ever tried, or ever will try. 

It can be seen easily that, with any value of the probability 
whatever, other than o or i, it is possible to have selections 
that do not give any limit for the ratio. For if the ratio is to 
tend, when m is large, to any limit between o and i, the 
numbers of things possessing and not possessing the attribute 

0 are both infinite. We cannot take the actual ratio of the 
whole number of 0’s to the whole number of 0’s to express 
the probability, for both numbers are in fact infinite and 
their ratio is indeterminate*. The method of proceeding to the 
limit is essential to the definition. But if at any stage we are 
able to select either a 0 or a not-0, it is possible to make the 
limit anything whatever, or there may be no limit at all. If, 
for instance, whenever a 0 occurs we write i, and whenever 
a not-0 occurs we write o, Ijm will be the mean of the first m 
terms in the series obtained. If they occur in such an order 
as to give the series 

lOIIOOOOIIIIIIII 

where the number of digits in any block after the first is 
equal to the number of digits that have occurred previously, 
the ratio is about | at the end of each block of i ’s, and about 

1 at the end of each block of o’s. It therefore tends to no 

* R. A. Fisher, with what looks like the courage of despair, says that 
in a “hypothetical infinite population*’ the ratio is perfectly definite. 
Cf. Phil. Trans, 222 A, 1922, p. 312. 
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limit whatever, 
series 


Again, in one selection we may get the 

lOIOIOIO 


which gives the limit but from the same class we could 
make the selection 

lOOIOOIOO 


which gives the limit The numbers of available o*s and I’s 
being both by hypothesis infinite, there is no possibility of 
exhausting either, so that such series are in fact possible. It 
is therefore possible for Venn’s ratio to tend to the wrong 
limit, or to give no limit whatever. The very existence of the 
probability on Venn’s definition requires an a priori restric- 
tion on the possible orders of occurrence of i/r’s and not-^’s. 
No supporter of this view has succeeded in stating the nature 
of this restriction, and even if it were done it would con- 
stitute an a priori postulate, so that this view involves no 
reduction of the number of postulates involved in the treat- 
ment of probability as an undefined concept with laws of its 
own. 

The difficulties become worse when we attempt to combine 
probabilities, for then we have to face an indefinite repetition 
of infinite series. This is called by Venn the use ot cross-series^ 
and forms an important part of his theory of inference. It is 
necessary, for instance, in giving a meaning to the proposition 
connecting the probabilities of a proposition referred to 
different data, 

P{p.q\h) = P{p\q.h)P{q\h). 

For an infinite series is necessary to give an account of 
P (/) I ^ . A), which is the limit derived from the frequency of 
the truth of p among entities satisfying q and A. Such entities 
are, however, only a part of those that satisfy A. Thus to 
establish a meaning for the numbers P {p , q\h) and P {q\h) 
we must consider all entities satisfying A, whether they satisfy 
q or not. Thus further series must be constructed to show how 
often q is actually true, and this requires, according to Venn, 



222 OTHER THEORIES OF SCIENTIFIC KNOWLEDGE 


an infinite number of series of entities all satisfying A, so that 
we can examine in one direction to find the frequency of p 
given q and A, and in the other direction to find those of q 
given A and oi p .q given A. Thus the difficulty of obtaining 
enough terms, an acute practical point in the simple case, is 
here intensified. Further, there is no more reason to believe 
in the existence of limits in this case than there was in the 
other; and the opinion that the limits, if they exist, will 
satisfy the relation is justified only if the samples are made 
according to some special rule. The difficulties are merely 
complicated and not removed by the use of cross-series ; and 
the statistical theory of probability becomes a network of 
begged questions. 

There is a question of the theory of probability, treated as 
an undefined concept, that is related to the question of the 
existence of Venn’s limit. If the probability of ^ given (f> is r, 
and is the same however many instances have been examined, 
what is the probability that when the sample becomes in- 
definitely large the ratio Ijm does tend to the limit r? To say 
that it does so means that for any quantity e, however small, 
we can find a number Mq such that, for all values of m greater 
than mo, Ijm is between r ± e. What is the probability of this 
proposition.? It has not, so far as I am aware, been evaluated, 
and a determination would be interesting. It is not rigorously 
unity, since it has already been shown that there are possible 
samples that do not satisfy the proposition. It may, however, 
differ from unity either finitely or infinitesimally. If the 
difference is finite, the Venn definition loses the last of its 
justification from the undefined concept view. If it is in- 
finitesimal, we might, if we thought the definition worth 
saving, save it at the cost of admitting infinitesimal prob- 
abilities different from zero. 

11-2. Keynes's theory of probability. To those already familiar 
with J. M. Keynes’s Treatise on Probability (1921) it will be 
obvious that the point of view of the present work is very 
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similar in some places and very different in others. The task 
of comparing the developments explicitly, point by point, 
would be too formidable, but could for the most part be 
achieved by the reader sufficiently interested to carry out a 
direct comparison. Keynes agrees with me in regarding prob- 
ability as an undefined concept, really following De Morgan 
and Jevons, with a series of earlier writers going back to 
Leibnitz. He differs from the earlier writers, and from me, 
in refusing to admit that all probabilities are expressible by 
numbers. This amounts to denying the postulate of the 
present theory, that of any two probabilities one is greater 
than, equal to, or less than the other ; or the equivalent, that 
of any three unequal probabilities one is between the other 
two. Granting this proposition, it has been proved in this 
work that probabilities can be uniquely associated with 
numbers. Keynes’s alternative is something like the view 
that probabilities resemble places on the earth’s surface; we 
might say that New York and London are both between the 
North and South Poles, but neither New York nor London 
is between the other and the North Pole. It seems to me 
that all probabilities actually are comparable and that 
Keynes is merely creating difficulties. He manages to pre- 
serve the form of the probability of the disjunction of two 
propositions by defining addition in terms of it ; that is, the 
proposition 

P(/> . j I A) + P(/) . I A) = P(/) I A), 

which to me is a law connecting numerical estimates of prob- 
ability, is to Keynes the definition of addition, and the terms 
in it may not be numbers at all. Similarly the law 2*32 (4) is 
converted into a definition of multiplication. The mathe- 
matical development remains much the same ; the only ques- 
tion is whether the results mean anything. Thus on Keynes’s 
views probabilities might be complex numbers ; and then it 
is possible that inequalities involving products, which are 
true for real numbers, may break down, and arguments based 
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on the approach of probability to certainty with repeated 
verification may fail. But my real objection to Ktynts's 
postulate is that it is one of those attempts at generality that 
in practice lead only to vagueness. 

It might be held that, since different people do appear to 
assess probabilities differently, Keynes’s postulate might fit 
the assigned probability instead of the true probability. But 
I do not think that this is the case. We know people who 
appear to assess all probabilities at either o or i ; we know 
others who seem to assess them all at whatever the available 
evidence ; and there may be some who assign the probability 
I to their own hypotheses and ^ to all those of other people, 
unless of course the latter happen to contradict their own, 
when their probability is o. But such estimates do not follow 
the quantitative rules connecting the probabilities of pro- 
positions referred to different data, and can only be under- 
stood by introducing psychological considerations. I think 
the correct attitude to them is that they are simply wrong, 
just as it is possible to get a wrong answer in solving an 
algebraic equation. 

For some other comments on Keynes’s work I refer to 
NaturCy Feb. 2, 1922, 132-133, 

11*21. There is just a possibility that probabilities may in 
certain circumstances require for their expression more 
numbers than the real numbers. Just as the real numbers are 
more numerous than the rational fractions, it is possible to 
define continuous series with more members than the real 
numbers, and yet satisfying the condition that of any three 
members of the series one is between the other two. But this 
is a degree of generality that has not yet required recognition. 

11 * 3 . Phenomenalism, This theory of knowledge may be 
defined by the rule that nothing is to be supposed to exist 
that cannot be reduced to descriptions of sensations. It may 
be traced back to the mediaeval writer William of Ockham, 
who said, “Entities are not to be multiplied beyond neces- 
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sity’’, and as such was probably a reaction against the dis- 
position of primitive man to postulate an independent god or 
demon as a cause for everything he could not understand. In 
its modern form it is effectively due to Ernst Mach and Karl 
Pearson, whose discussions of the bases of mechanics led more 
than anything else to the recognition of the need to define force 
and mass in terms of actual experience, so far as possible, and 
to the dropping of such ideas as absolute position and ether. 
Having myself started from the phenomenalist position, I 
must express my great indebtedness to these writers, but I 
consider that the pure phenomenalist attitude is not adequate 
for scientific needs. It requires development, and in some 
cases modification, before it can deal with the problems of 
inference. We must, as has been said already, always dis- 
tinguish between sensations actually experienced and those 
inferred from other sensations. The former can be described ; 
the latter can only be inferred with greater or less degrees of 
probability. Mach hardly considers the question of prob- 
ability ; Pearson does not go beyond Laplace's theory. It has 
been shown here that a requisite of any satisfactory theory of 
inference, as actually carried out in scientific work, is a re- 
cognition of the high prior probability of the simple law. 
There is no harm in concepts that cannot be defined as classes 
of sensations, provided that a few of them will help in de- 
scribing a large number of sensations. This is the test of the 
scientific validity of a concept ; philosophical reality has no- 
thing to do with it. An electron, for instance, is a valid scientific 
concept; I think that it is merely playing with words to say 
that it is a class of sensations, or that it can be described in 
terms of sensations. The same applies to the matter at the 
centre of the earth, or to the state of the earth just after its 
formation; both enable us to co-ordinate sensations actually 
experienced and are therefore admissible concepts. 

11 * 4 . The theories of Russell and Whitehead. Mr Bertrand 
Russell, in Mysticism and Logic (1917), tries to tackle the 
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problem of actually defining objects in terms, not exactly of 
sensations, but of sense-data^ which are effectively sensations 
with the errors of observation removed. Physical objects 
still cannot be adequately defined as the class of those sense- 
data that, in ordinary language, would be said to be percep- 
tions of it, for then the object would change when new aspects 
of it are observed, and this is not to be allowed. Therefore he 
considers the object defined in terms of all possible aspects 
of it ; these aspects are called semiUlia^ and resemble sense- 
data in everything except that the majority of them are not 
perceived. A physical object is then a class of sensibilia. 

From the practical scientific standpoint the weakness of 
this attitude is that we do not know what the sensibilia are 
like. An object, on this theory, could never be described until 
we had a knowledge, by experience, of all its aspects, per- 
ceived and unperceived, and this is inherently contradictory. 
Even the perceived sensibilia, or sense-data, cannot be de- 
scribed in terms of sensations until we have some rule for 
removing the errors of observation. The unperceived ones 
are necessarily never known directly, but have to be inferred 
from the perceived ones ; and this can be done only by using 
the laws of physics, inferring the nature of the object, and 
then proceeding to the unperceived sensibilia. The physical 
object and the laws of physics are anterior in knowledge to 
the sensibilia, and Mr Russell’s theory, whether it is logically 
consistent or not, is not a theory of scientific knowledge. 

In Prof. Whitehead’s theory* events, instead of sensibilia, 
are the fundamental entities. Each event contains other 
events, so that we can specify series of events such that each 
event in a series surrounds all after it. The limit of such a 
series is a point-event, and it is to such point-events that the 
laws of physics are supposed to apply. But the notion of a 
limit requires an infinite class, and an infinite class of ob- 
servations is impossible in practice. 


• An Enquiry into the Principles of Natural Knowledge ^ 1919. 
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We may say that it is never possible to construct a valid 
theory of knowledge that involves the use of infinite classes 
of empirical data. The objection is similar to that given by 
Poincar^* in his criticisms of Cantor’s theory of infinite 
numbers. Poincar 6 argued that it is impossible to assert any- 
thing about a class, and in particular an5rthing about the 
number of its members, until every member of the class has 
been defined in words ; and as only a finite number of entities 
can ever be defined in words, it is impossible to know any- 
thing about an infinite class, so that there can be no knowledge 
of infinite numbers. The argument, as it stands, is not valid 
against Cantor’s theory, for in order to make an assertion 
about a class it may not be necessary to have definitions of all 
the members separately ; often a general proposition about all 
members can be asserted or postulated, and is enough for the 
purpose. Poincar^, indeed, seems to have overlooked the fact 
that if his argument were sound it would also destroy the whole 
theory of infinite series and of differentiation and integration ; 
thus little would be left of higher pure mathematics. Thus the 
convergence of a series depends on the proposition that the 
sums of the first w, « + i, w -f 2, ... terms, for some value 
of w, all differ from a certain number, called the sum of 
the series, by less than a fixed quantity €. These sums are 
infinite in number, and hence it would be impossible, if 
Poincare’s assumption were granted, ever to prove that a 
series is convergent. This result is, of course, quite unaccept- 
able. But the argument would go even further than this. 
Nobody has had time in his life to construct definitions of 
every member of a class of a million members; and as a 
number is merely a property of a class, it should be impossible 
to prove that, for instance, 

I 000 001 2 == I 000 002 000 001. 

Thus the argument would also invalidate most of arithmetic. 
If therefore we believe that the propositions of arithmetic 

• Science et Mithode^ 1908, 192-2 14. 
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have some meaning and are true, we cannot accept Poincare’s 
objection to the theory of infinite numbers. 

But while the argument is wrong in this case, it is clearly 
valid when our only source of information about the members 
of a class is empirical ; for the total number of observations 
a person can make in his life is finite, and hence his ex- 
perience alone can never tell him anything about all the 
members of an infinite class of entities. Any proposition 
about such a class, or about all its members, is necessarily 
either wholly a priori or else an inductive generalization, 
and neither known directly nor obtainable from experience 
by the principles of pure logic alone. The fundamental data 
of any branch of science must consist of a finite number of 
observational results and some a priori postulates. 

One consequence of this is that we can never prove the 
existence of a limit to which a series of entities known by 
experience may tend, for in order to establish the existence 
of such a limit we should need to have knowledge that an 
infinite number of such entities are within a definite distance 
of that limit. This by itself would not be a fatal objection to 
any such theory, for there seems to be no possibility of con- 
structing a theory of knowledge without some assumptions, 
and it may be considered that in the case in question certain 
conditions are satisfied under which the existence of a limit 
is known a priori. But what is fatal is that in physical pro- 
blems we do not merely want to know that the limit exists ; 
we want its value according to some definite system of 
measurement, and that value can never be known a priori \ 
indeed, if it were, there would be no need to make measure- 
ments at all. Thus if a limit is ever used in a scientific theory, 
its value and all propositions about it are neither a priori nor 
known by experience, and therefore are not primitive pro- 
positions that can be used in a theory of knowledge based on 
experience. It is seen that this consideration rules out at 
once the statistical definition of probability, with the theories 
of Russell and Whitehead just mentioned. 
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PROBABILITY IN LOGIC AND 
PURE MATHEMATICS 

By convention it has been decided that if the proposition p 
implies the probability number P (9 I />) = i . The word 
implies is used in the ordinary sense, namely, that if p is true, 
then q is true. This is the definition given by Whitehead and 
Russell. There is, however, a slight difficulty. Whitehead and 
Russell prove the propositions 

p implies that q implies />, 

^p implies thatp implies 9. 

These are often read “a true proposition is implied by every 
proposition** and “a false proposition implies every pro- 
position, true or false**. But when we analyse these pro- 
positions in terms of the definition, they become 

If p is true, then if q is (also) true, p is true. 

If p is untrue, then if p is true, q is true. 

The first is now seen to mean simply that additional (true) 
information does not contradict a proposition already known 
to be true ; its paradoxical appearance is gone. It is expressed 
in our rule 

P(p\p.q)==i. (3) 

The second, on the other hand, does not enable us to infer 
q without the knowledge that p is both untrue and true ; and 
this circumstance fortunately never arises. But formally it 
appears to require 

P(9l/).~/)) = i; P(~ 9 |/).~/>)= I, (4) 

and therefore 




P{qy ■^q lp-~P) = 2, 


( 5 ) 
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whereas no probability can exceed i . Similarly we could write 
'^pior q in (3) and get 

P(/) I/) = I, (6) 

and replacing/) hy 

P(^p\p.^p)^i. (7) 

It seems that contradictions are inevitable if we adhere to 
these propositions and allow contradictory propositions to 
appear among the data simultaneously. I think the correct 
convention in these circumstances would be that the prob- 
abilities are simply indeterminate. Thus we have 
P(p.'^p.q\h) = o 

= P{ 9 \P * '^P •f^)P(P • "^P I 

and P(p . '^p \ h) = o. Hence P(q \ p . ^p .h) is of the 
form 0/0 and therefore is indeterminate. 

So far as the theory of scientific method is concerned, the 
point is, of course, purely academic. Our estimates of prob- 
ability are always to be based ultimately on a priori principles 
and sensations, which are never mutually contradictory, so 
that the difficulty can never give any trouble in practice. 

It might be suggested that the statement “/) implies q^* 
means more than “if p, then q^\ and requires an actual 
proof that the relation of implication holds for the propositions 
p and q\ until this is given, the probability of q given p is less 
than unity. I think that this is a wrong attitude. Consider 
some undemonstrated proposition of pure mathematics, such 
as F ermat ’s last theorem, or the proposition that the thousandth 
decimal in the expression for e is zero. The data in each case 
are perfectly definite. In the latter case it is known how 
the proposition could be tested if anybody was sufficiently 
interested to do the work; in the former all the powers of 
the natural numbers are perfectly definite, and it is only a 
question of whether actually, with a value of r greater than 2, 
two numbers x and y exist such that {x^ 4- is a whole 
number. Each proposition is simply true or false on the data 
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of pure mathematics themselves ; a proof does not affect their 
truth-values, but merely finds out what they are. The pro- 
bability of Fermat’s last theorem, given the data of pure 
mathematics, is therefore either o or i ; we simply do not 
know which. The proposition “it can be proved that Fermat’s 
last theorem is true ”, on the other hand, is different from the 
proposition “ Fermat’s last theorem is true ”, for it introduces 
the question of the possibility of proof, which is a question 
of the capabilities of the human mind, and a legitimate field 
for scientific investigation based on experience. In view of 
the efforts that have been made to prove the theorem, we 
may say that the probability of this proposition is small, 
though not absolutely zero. 
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INFINITE NUMBERS 

The following remarks are not intended as a full account oi 
the modern theory of infinite numbers. This book is meanl 
mainly for theoretical and experimental physicists, and foi 
their purposes a brief summary is probably all that is needed 
If more is required, G. Cantor’s Transfinite Numbers ^ or Little- 
wood’s Elements of the Theory of Real Functions, may be read 
a full account is in Whitehead and Russell’s Principia Mathe- 
matical A glance inside is worth while, as the inside is ever 
more impressive than the outside. 

The fundamental notion involved in number is that oi 
comparison of classes. If we have two classes a and j8, such 
that they can be arranged so that to every member of o 
corresponds one member of jS, diflFerent «’s corresponding tc 
different jS’s, then the number of members of a is less thar 
or equal to that of )3, and that of is greater than or equal tc 
that of «. If the classes can also be arranged so that to ever} 
member of corresponds one of a, then the classes are said 
to be equal in number, and an arrangement can be found sc 
that each member of either class corresponds to one of the 
other, none at all being left over. The smallest infinite numbei 
is the number of the positive integers; this is called Nq. We 
can prove that Xq is also the number of the rational fractions 
For we can arrange the rational fractions thus: 

f > h Jy h f » h f > f > I > h f> f > 

Here we first of all group together those fractions with the 
sum of the numerator and denominator the same, and arrange 
the groups so that this sum is greater in the fractions of one 
group than in those of any earlier group. In each group we 
place the fractions in order of increasing numerator. This 
arrangement includes every rational fraction, and they arc 
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put in a definite order, so that every fraction is reached in a 
finite number of steps from the beginning. The jpositive 
integers i, 2, 3, ... can therefore be placed against them. A 
one-one correspondence is therefore established between the 
rational fractions and the positive integers, and the two 
classes therefore have the same number. 

It can be shown similarly that the number of numbers that 
are the roots of algebraic equations with rational coefficients 
is Nq. For an equation may first be multiplied by the lowest 
number that will clear it of fractions. For each equation we 
can take the sum of the absolute values of the coefficients 
and the degree, and we can arrange the equations in groups 
according to increasing values of this sum. The number of 
equations in each group is finite, and the total number of their 
roots is also finite. Thus we can arrange both the equations 
and their roots so that every member is reached in a finite 
number of steps from the beginning ; the number of algebraic 
equations, and the number of their roots, are therefore 
both ^<0 • 

Similarly the number of differential equations of finite 
order and degree, such that each coefficient is capable of Nq 
values, is No . For we can begin by arranging the No values of 
each coefficient so that they correspond to the whole numbers, 
and then replacing the values by the whole numbers them- 
selves. This gives a new class of differential equations with 
the same number as the first. Now rationalize each equation. 
Then form for each equation the sum of the order, the degree, 
and all the coefficients. Arrange the equations in groups, 
according to increasing values of this sum. The number of 
equations in each group is finite, and therefore we can arrange 
the equations so that every member is reached in a finite 
number of steps from the beginning ; and the total number of 
these equations is infinite. Hence the number is Nq. 

If we have two classes of numbers m and n, we can form 
pairs of things one from each class. The possible ways of 
forming such pairs are mn in number. This is taken as the 
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definition of the product of two numbers. We can prove that 
the product of by any finite number or by itself is also Xq . 
Considering the latter proposition, . Kq would mean the 
number of pairs of the form {x^ y), where x and y are whole 
numbers. We can arrange these pairs in groups for which 
X -{■ y has the same value, and then arrange the pairs in each 
group in order of x increasing. In this way the pairs are 
arranged in order so that each is reached in a finite number 
of steps from the beginning, and their number is infinite. 
Hence their number is Hq . Thus Nq • No = ^ • 

Suppose next that we have two classes of numbers m and ». 
With any member of the first class we can associate one of 
the second class in n ways ; with another member of the first 
class we can associate one of the second in n ways (repetitions 
being allowed). Then we can say that the number of ways of 
covering the two together by members of the second class 
is ti^. If we consider every member of the first class associated 
with every member of the second, the whole number of ways 
of carrying out such pairings is called n"*. This operation is 
called exponentiation. In particular if m numbers are to be 
assigned and each of them can take n values, the total number 
of ways of assigning values to all of them is 

The number of decimals to the base n is For in each 
decimal there are Nq places to be filled, and n possible numbers 
o, I, 2, ...» — I can be placed in any place, irrespective of 
what numbers are in any other place. If we assume that any 
real fraction, rational or irrational, can be expressed as a 
decimal with any base, it follows that the number of real 
fractions is where n may be any whole number greater 
than I. Also, since our fraction can be reduced in particular 
to the base 2, we have 

n^Q — 2^0 = C, 

say. This number C, the number of real fractions, is called 
also the number of the continuum. 

It can be proved that if ;i is the number of a class, 2" is 
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always greater than n. We need this result especially for the 
case where n = Kq. Suppose, if possible, that 2^0 was equal 
to This would mean that all real numbers could be so 
arranged that they corresponded one by one to the whole 
numbers in ascending order. Then imagine each converted 
to a decimal to the base 2. In each place the digit is either 
o or I . We can now construct a decimal that differs from the 
first of the series in the first place, from the second in the 
second place, and so on. This decimal will then differ from 
every member of the series that was supposed to include all. 
It follows that 2^0 > Xo . 

It follows at once that 2^0 > Xo^, where n is any finite whole 
number. For by repeated application of the result that 
X(,2 == Xo we can show that Xq" = No • 

The same set of things may be arranged in different ways. 
If an arrangement is such that each member of the series has 
an immediate successor, we say that the series is well-ordered. 
Thus the whole numbers i, 2, 3, ... in ascending order of 
magnitude constitute a well-ordered series ; for to each num- 
ber there corresponds a ‘^next” such that the latter follows 
it and there is none between. The rational fractions, or the 
algebraic numbers, in ascending order, are not well-ordered, 
because between any two there lie an infinite number of 
others. These series are called dense. The whole numbers in 
ascending order are said to be of ordinal type to ; the rational 
fractions or the algebraic numbers between o and i in ascend- 
ing order, omitting o and i themselves, are said to constitute 
a series of ordinal type t]. It is doubtful whether every class 
can be arranged so as to be well-ordered ; the doubt extends 
to the continuum itself. The continuum of real numbers 
between o and i , omitting the extremes, is said to be of type 0 , 

Little progress has been made with the theory of the ratios 
of infinite numbers. It appears to be very doubtful whether 
such ratios could be defined so as to satisfy the ordinary 
rules of fractions; we should wish, that is, to be able to 
multiply or divide numerator and denominator by the same 
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number without altering the fraction. But if we try to do so 
here we get, for instance, 

JL = iio = ^ = T 

No 

In the circumstances it seems undesirable to admit them to 
the theory of scientific method, at least until they have some 
recognized status in pure mathematics. 

The number of functions of a real variable is C^. For to 
every value of the variable may correspond any of C values 
of the function ; and if the variable can assume any value 
within a continuum, it has C possible values. Hence by 
definition the number of functions is C^. 

The number of continuous functions is C. For if a con- 
tinuous function is assigned for all rational values of x it is 
determined for all other values. For each rational value of Xy 
the function can take C values. Hence it can be assigned for 
all rational values in C^o ways. But 

C«o = (2«o)So = = 2«o = C. 

The same is true for analytic functions, even when each co- 
efficient can take only Nq values. For the number of such 
functions, n say, is less than or equal to C, since they form 
only a part of all analytic functions. But their number is 
> 2^0 = C. Thus 


and therefore n = C, 


C>n> Cy 
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THE ANALYTIC TREATMENT OF 
THE SINE AND COSINE 


We define cos x and sin x by their expansions in series 

^2 ^4 

yo = cosA;= I --j+ .... (i) 

y, = sin^ = *-|-, + ^-j-^-, + .... (2) 

Both series are absolutely convergent and differentiable term 
by term for all values of x. We see at once that 

d 

-7 COS X = — sm X 

dx 


sm X 
dx 


COS X. 


( 3 ) 


and that both cos x and sin x satisfy the differential equation 

dy 


dx^ 


+ y = o. 


( 4 ) 


If we multiply this by 2 the left side becomes 
differential and we infer that 

(^) constant. 


a perfect 


( 5 ) 


Whether y is taken to be cos x or sin x^ we can substitute for 
its derivative from (3) ; hence 

cos2 ^ sin^ X = constant ; (6) 

and putting jc = o we see that the constant must be i . Thus 
cos^ X + sin^ X = 1, (7) 

When x==Oy yo= = o, Hence for a range 

of positive values yo is positive and decreases as x increases. 
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It must therefore either (i) vanish for a finite value of 
(2) begin to increase again without vanishing, or (3) tend to 
a finite limit less than i as jk? increases indefinitely. Alter- 
native (2) implies that dyojdx vanishes for two values of Xy 
o and another, and therefore d^yjdx^ must vanish between 
them. But this cannot be the case, because d^yjdx^ = — Jo 
and y^ is by hypothesis positive throughout the range. 
Alternative (3) implies that dyjdx tends to zero and therefore 
again d^y^jdx^ must vanish for a finite value of Xy which is 
again contradicted by the supposition that y^ is always posi- 
tive. Hence there is a value of x that makes y^ zero. We call 
this ^TT. Evidently when jc = Jtt, sin a? = i by (7). We have 
from (5) and (7) 

± ( 8 ) 
V(i - :vo‘“) 


whence by integrating and introducing the limits 

Now consider the function 

/ (^1 > ^2) = cos Xi cos X2 — sin x^ sin x^y (10) 

and put X2=^ X — x^. Then 

f {xiyX — Xi) = cos Xi cos (x — Xi) — sin Xi sin {x — (i i) 
Differentiating with regard to Xi , we have 
0 

dx ^ ~ ~ ~ ^ 1 ) 

— cos Xi sin {x — Xi) 4 - sin x^ cos {x — x^) 

- o. (12) 

Thus f {xiyX — ^1) is a function of x only. We can evaluate 
it by putting x^ = o, when we see that its value is cos x. 
Now restoring X2 we have 

cos (iJCi + X2) = cos Xi cos X2 — sin Xi sin X2 . (13) 

By differentiation with regard to Xi 

sin {xi -f X2) == sin Xi cos X2 + cos x^ sin X2 . (14) 
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Now replace *1 by * and by Jtt. Then 

cos (Jir + *) = — sin *; sin (Jtt + *) = cos x. (15) 
Replace x now by Then 

cos (tt + *) = — sin (Jtt + ar) = — cos *; 

sin (ir + *) = — sin x. (16) 

Now replace » by w + «, and we have 

cos ( 27 r + *) = cos ac; sin {ztt + x) = sin x. (17) 

Therefore the functions cos x and sin x have period zn. 

We have thus obtained from the analytic definitions the 
differential equation (4) satisfied by the cosine and sine; the 
relation (7) showing as a corollary that these functions cannot 
have absolute values exceeding i for real values of the 
argument; the addition formulae (13) and (14); and the 
periodic property (17). 
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1. Approximation to f {1) ^Ci when r and I are 

large and x + = i 

We introduce Gauss’s H-function, defined for real values of 
u greater than — i by 


n 


When u is an integer 
Then 


roo 

(«) = 

JO 


e-* f^dt. 


n («) = tt ! . 




n(r) 


(1) 

( 2 ) 
( 3 ) 


n (/) ri (r - /)■ 

When u is large, we have Stirling’s approximation* 

n (w) = (27T)i e~^ {i 4- O (4) 

Let /= + ari 4- 17, (5) 

where | ^ | is less than a number k independent of r. Then 
n (/) - (27r)i {rx 4 - ar^ + {1 + O (r"")}, (6) 

n (r - /) = (27r)i {ry — ar^ — {i 4 - O (r“^)}, (7) 

l//(/) = ^ ^ = (27rrxj-)i (i + — i + 

n (r) x^y^~^ \ xri rx) 

X (i_ « -5.y"% + 0(r-i)}, (8) 

\ vr2 rv/ 


jra ry/ 


(/+i)log( 


I + -- + 

xr^ rx/ 


(rx + ari + tj) + 


rx 2x^r 




(9) 


* Cf. Whittaker and Watson, Modern Analysis^ 12*33 • 
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• (>■-/+ i) log (i - - ari + -- - 1? + O (r"*), 

\ yrt ryJ zy 

(10) 

and therefore 

log i//(0 = i log {z-nrxy) + ^ + O (r-i), (ii) 

SO that 

/(/) = {ztrrxy)-^ exp {i + O (r"*)}*. (12) 


II. Approximation to g{1) = ^Ci when 

my r are large 

If we consider the expressions 

where x y = i and x is arbitrary, we can choose x so that 
the maxima of both expressions occur for the same value of 
ly say 4. It is necessary that 

4 = rx\ m — Iq {n — r) X, (i) 

Hence x=^mjn\ y = {n — m)lny (2) 

4 = rmjn ; m — Iq = {n ~ r) mjn = (w — r) Xy 
r — Iq = r {n — m)ln = ry ; 

(m - r) - (m - 4) = (n - r) {n - m)jn = (n - r)y. (3) 
Now put / = 4 + By Lemma I 

''Cl x^y^-^ = {znrxyyi exp j- - ^ , (4) 

( 5 ) 

* 1 am indebted for this proof to Mr Newman ; it replaces a somewhat 
longer one due to Bromwich, Phil. Mag. 38 , 1919, 231-235. 
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Whence by multiplication 


^Ci 

= (27r^)-i {r (n - r)}-i exp |- ^ • 

But by (4) of Lemma I 


nc = -- n”+i ^ 

OT”*+i (« — 


(27rwrv)~i a!~”* j"*”"”*’, 


whence 





n 

ZTTxyr (« — r) 



£ n^_\ 

2 r {n — r) xy) ' 


( 6 ) 


( 7 ) 

( 8 ) 


Consider now S g ( 1 ), (9) 

p = 0 

When r (« — r) xyjn is large this sum can be replaced ap- 
proximately by an integral; when^ increases by i, one term 
is added to the sum. Hence 


Jo* « - [z nxyrin- T^^ 11 3 r ‘‘I’' 

(lO) 

Put 

p = {zr {n — r) xyjn}^ ^ = {zr {n — r)m{n — m)ln^}^ (i i) 

Then 

^ ^ (Z) = -tr-i [ e-^d^ (12) 

p-o Jo 

= i erf i, (13) 

where erf f denotes the error function defined by 

erf I = ^ [ (14) 

vtt Jo 


Then (13) is the probability that I lies between m/n and 
nnjn + p. The probability that it lies between o and r is 
unity ; but since i — erf f is insignificant for moderately 
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large positive values of i and i + erf ^ insignificant for 
moderately large negative values of f , this is equivalent to 


which is true. 



III. Evaluation of 


n 


r-l 


’"C, "-'■C, 


m—l 


and 


V rn n-rn 
r=l 


r- / 
n — m 


We have 

— i) (/+ 2 )...r 

/!(r-/)!~"i.2.3...(r-/)» 
which is the coefficient of in the binomial expansion 
of (i - Also, similarly, is the coefficient 

of the binomial expansion of (i — 

Hence, by multiplying the two expansions, we see that 

S is the coefficient of in the 

r- 1 

expansion of (i — *)-(”*+*>; the coefficient, that is, of 
But this coefficient is 

(m + a)(OT + 3)... (« + i) _ _ (« + i)i 

I .2 . ... {n - m) (ot+ i)!(«-ff2)! ■ 

Also 

r - I ^ ! r - I ^ H / + i 

’^n-m l\(r — l)\n — m {I + i)l{r — I — i)\ n — m' 
Whence 


^ r — / 

r_i n — m 


I 4- I ^ 
n — Mr^l ^ 
/ + I 


'm-l 


n — m 


by the last result 


(/+ i)i 

(n — m)\{m-\' 2) V 
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